DOC  aE-COPt  HJA072259 


ENTROPY  ANALYSIS  OF  FEEDBACK 
FLIGHT  DYNAMIC  CONTROL  SYSTEMS 


UNIVERSITY  OF  CALIFORNIA.  LOS  ANGELES 
SCHOOL  OF  ENGINEERING  AND  APPLIED  SCIENCE 
7620  BOELTER  HALL.  UCLA 
LOS  ANGELES.  CALIFORNIA  90024 


January  1979 


TECHNICAL  REPORT  AFFDL-TR-78-123 
Final  Report  for  Period  May  1977-May  1978 


Approved  for  public  release;  distribution  unlimited. 


AIR  FORCE  FLIGHT  DYNAMICS  LABORATORY 
AIR  FORCE  WRIGHT  AERONAUTICAL  LABORATORIES 
AIR  FORCE  SYSTEMS  COMMAND 
WRIGHT-PATTERSON  AIR  FORCE  BASE,  OHIO  45433 


79  08 


1 


016 


77 ta.uiu» , or  ocner  aata  are  used 

posa  other  than  in  connection  with  a definitely  related  Government 
^operation,  the  United  States  Government  thereby  incurs  no  reaponsi. 
obiiyation  whatsoever/  and  the  fact  that  the  government  may  have  ft 
furnished,  or  in  any  way  supplied  the  said  drawings,  specification 
data,  la  not  to  be  regarded  by  Implication  or  otherwise  as  in  any  i 
sing  the  holder  or  any  other  person  or  corporation,  or  conveying  ai 
permission  to  manufacture,  use,  or  sell  any  patented  invention  that 
way  be  related  thereto. 


Gerald  W.  Francl 


/AMES  W.  MORRIS, 'Thief 

Control  Systems  Development  Branch 

Flight  Control  Division 


FOR  JOT  COMMANDER 


MORRIS  A.  OSTGAARD 
Assistant  for  Research  and 


Technology 

Flight  Control  Division 


„ TO*  ^dr*8*  changed,  if  you  wish  to  be  removed  from  our  mailing  list, 

•yj0<w ' *»  orpniMton  pie...  notify 

AFFH/FSL PhFB,  OB  45433  to  help  us  maintain  a current  mailing  list”. 


Copies  of  this  report  should  not  be  returned  unless  return  is  required  ly  se- 
curity considerations,  contractual  obligations,  or  notice  on  a Specific  document. 

AIR  FORCE/5*7»0/6  July  1970  — 100 


fefee^ams 


Pinal  rap t0  May  77-May  78 


PHY  .ANALYSIS  OF  FEEDBACK  FLIGHT  DYNAMIC 


ENTROPHY  ANALYSIS  OF  FEI 
cbNTROL  SYSTEMS  $ 


l.  1 trc  or  i»i**ont  » fimoo  tqvt  hi  a 

FINAL  REPORT  5/77-5/78 


•.  rtrromiNt  oao.  uroKT  numdid 


I.  rcnroMMiNO  onoaniiation  na mi  and  aooncii 


5/1/78 


University  of  California,  Los  Angeles 

School  of  Engineering  and  Applied  Science  j 

7620  Boelter  Hall,  UCLA,  Los  Angeles,  CA  90024 


n.  controllino  or  net  namc  and  aodnus 
United  States  Air  Force 
Air  Force  Systems  Command  ' 

ASD/  FGL  WPAFB,  OH  45433  Gerald  W.  Francis 

14.  MONITONINO  AGCnCT  NAMC  * AODRttWIf  tflll.fml  Ina  Cw"r*lll««  OHIO)  *».  SCCUBlt V CLASS.  (•!  IM* 

Office  of  Naval  Research  UNCLASSIFIED 

1030  E.  Green. Street 


Pasadena,  CA  91106  Attn:  Mr.  Perry  belike 


It.  OlilftlBUflOM  I1A1CMCNT  (•!  «M«  *•?"*) 


Approved  for  public  release,  distribution  unlimited 


OlCt  MUMCAIIOM/OOIMGAAOIMO 
ICMCOULt 


flj  'Sfblv  17 


now  iM4timiT(rrwrM««rw  ww<n»SW»t.  u rniutmt  Arm 


It.  KEY  *0*0%  (CenMmfe  an  reroree  «l«fe  If  9*09099  f 7 and  Identify  by  bl*t\  nu mler) 

Entropy  function.  Flight  Control  Systems,  Aircraft  Sensors 


IftTtACT  (vMllRue  en  tide  II  mtanary  an#  IdenMI,  If  lieu  BunltrJ 

...is  report  is  a study  of  the  application  of  the  enjropy  function  of  Information 
Theory  to  the  analysis  of  sampled  data  systems  so  characteristic  of  the  evolving 
and  important  field  of  digital  flight  control,  including  multimode  systems. 

The  systems  studied  are  both  feed-forward  and  feedback  with  the  emphasis  placed 
on  the  regulator  problem.  The  feature  common  to  all  the  configurations  is  the 
presence  of  a sensor  which  measures  the  input  signal  and  which  has  an  output 
that  is  usually  some  random  function  of  the  input.  For  the  purposes  of  the 
analysis  it  is  convenient  to  describe  the  behavior  of  the  sensor  by  its 


1 jam  n 


COITION  or  I NOV  t»  It  OBSOLETE 
«/N  1101-014*  St.  I | 


Sensor  Channel  Transmittance,  which  is  defined  as  the  mutual  information 
between  the  input  and  output  of  the  measuring  device.  This  quantity  is 
not  independent  of  the  properties  of  the  input  signal;  however,  in  any 
given  problem  it  need  only  be  calculated  once. 

For  proofs  relating  to  the  estimation  problem  the  constraints  on 
the  system  elements  are  so  relaxed  that  the  sensor  model  need  not  be 
known  and  the  only  description  of  this  device  that  is  required  is  its 
Sensor  Channel  Transmittance.  This  implies  that  processors  not  having 
acceptable  models,  such  as  human  operators,  may  now  be  successfully 
studied  through  the  use  of  entropy  analysis.  For  the  feedback  problem 
the  restrictions  are  somewhat  tighter  in  that  although  the  sensor  data 
processing  may  be  nonlinear  the  noise  must  be  constrained  to  be  additive. 
However,  this  class  of  sensors  is  very  important  and  the  theory  of  feed- 
back control  is  advanced  significantly. 

The  major  result  of  the  report  is  that  when  a sensor  path  (either  feed- 
forward or  feedback)  is  used  to  improve  the  performance  of  a system  such 
as  a flight  control  system,  the  entropy  of  the  system  error  can  never  be 
reduced  by  more  than  an  amount  equal  to  the  Sensor  Channel  Transmittance. 
This  approach  leads  to  the  determination  of  the  optimum  system  performance 
by  using  only  open  loop  quantities  that  are  easily  determined.  None  of 
the  results  involve  calculating  the  optimum  filter  that  must  be  used  to 
achieve  the  minimum  entropy  performance. 

^This  research  is  important  because  it  imbeds  the  control  problem 
in  the  communication  problem  and  clearly  demonstrates  the  manner  in  which 
the  information  handling  capability  of  the  system  elements  limits  per- 
formance, and  is,  therefore,  of  considerable  potential  significance  to 
advanced  flight  dynamic  systems. 


. ..  . .»  .1  1—  wm-  .1  .1.11  1 1 " " ' ■^m 

__  ■■—  i 

1 

PREFACE 


This  report  is  a study  of  the  application  of  the  entropy  function 
of  Information  Theory  to  the  analysis  of  sampled  data  systems  so  char- 
acteristic of  the  evolving  and  important  field  of  digital  control, 
including  multimode  systems.  The  systems  studied  are  both  feed-forward 
and  feeback  with  the  emphasis  placed  on  the  regulator  problem.  The 
feature  common  to  all  the  configurations  is  the  presence  of  a sensor 
which  measures  the  input  signal  and  which  has  an  output  that  is  usually 
some  random  function  of  the  input.  For  the  purposes  of  the  analysis  it 
is  convenient  to  describe  the  behavior  of  the  sensor  by  its  Sensor 
Channel  Transmittance,  which  is  defined  as  the  mutual  information 
between  the  input  and  output  of  the  measuring  device.  This  quantity 
is  not  independent  of  the  properties  of  the  input  signal;  however,  in 
any  given  problem  it  need  only  be  calculated  once. 

For  proofs  relating  to  the  estimation  problem  the  constraints  on 
the  system  elements  are  so  relaxed  that  the  sensor  model  need  not  be 
known  and  the  only  description  of  this  device  that  is  required  is  its 
Sensor  Channel  Transmittance.  This  implies  that  processors  not  having 


processing  nay  be  nonlinear  the  noise  aust  be  constrained  to  be  additive. 
However  this  class  of  sensors  is  very  important  and  the  theory  of  feed- 
back control  is  advanced  significantly. 

The  major  result  of  the  report  is  that  when  a sensor  path  (either 
feed-forward  of  feedback)  is  used  to  improve  the  performance  of  a system 
such  as  a flight  control  system,  the  entropy  of  the  system  error  can 
never  be  reduced  by  more  than  an  amount  equal  to  the  Sensor  Channel 
Transmittance.  This  approach  leads  to  the  determinationof  the  optimum 
system  performance  by  using  only  open  loop  quantities  that  are  easily 
determined.  None  of  the  results  involve  claculating  the  optimum  filter 
that  must  be  used  to  achieve  the  minimum  entropy  performance. 

This  research  is  important  because  it  imbeds  the  control  problem 
in  the  communication  problem  and  clearly  deomstrates  the  manner  in  which 
the  information  handling  capability  of  the  system  elements  limits  perfor- 
mance, and  is  therefore,  of  considerable  potential  significance  to  the 
advanced  flight  dynamic  systems. 


TABLE  OP  CONTENTS 


L 


I INTRODUCTION  

Historical  Survey  

The  Design  of  Bxperiaents  . , 

One  Armed  Bandit  Problems  , . , , , 

Hunan  Operator  Systems  

Systems  with  Random  Parameters  

General  Description  of  Problems  Considered  in  this  Dissertation 

Brief  Summary  of  Contents  .....  

II  DEVELOPMENT  OF  ENTROPY  AS  AN  ANALYSIS  TOOL  , 

Why  Entropy?  . ..........  . 

Properties  of  Entropy  , 

Theorem  2.2.1  

Theorem  2.2.2  ........  . 

Mutual  Information  . 

Theorem  2.3.1  . . . . 

Coordinate  Transformations  

Entropy  of  Markoff  Sources  ......  

The  Channel  Capacity  of  a Sensor  .......  

The  Sensor  Channel  Transmittance  . 

Entropy  of  the  Sun  Vector  , 

Incremental  Channel  Transmittance  ....  

III  THE  ESTIMATION  PROBLEM 

Naive  Application  of  Entropy  Analysis  .....  .... 

Theorem  3.2 . . . 

The  Estimation  Problem;  Description  . . 

v 


TABLE  OF  CONTENTS  (Continued) 


Page 

The  Entropy  Theorem  for  Estimation  63 

Theorem  3.4 63 

The  Channel  Transmittance  Approach  to  Estimation  69 

Theorem  3.S 

Example 

Corollary  I to  the  Estimation  Theorem  3.S 75 

Steady-State  Entropy 76 

IV  THE  FEEDBACK  CONTROL  PROBLEM 78 

Summary 78 

The  Feedback  Control  Problem;  Description  80 

System  Uniqueness  81 

The  Entropy  Theorem  for  Feedback  Control  84 

Lemma 84 

Theorem  4.4 85 

V THE  TRACKING  PROBLEM 90 

Introduction  90 

The  Tracking  (or  Servomechanism)  Problem;  Description  91 

The  Entropy  Theorem  for  Servomechanisms  93 

VI  REAL  TIME  DATA  PROCESSING 96 

Summary 96 

The  Sequential  Channel  98 

Incremental  Sequential  Transmittance 101 

The  Sequential  Entropy  Theorem  for  Estimation  102 

Introduction  .....  .....  .102 

Lemma 103 

Proof  of  the  Theorem  for  Sequential  Estimation 110 

Theorem  6.4 110 

vi 


TABLE  OF  CONTENTS  (Continued). 


Page 

Corollary  I 112 

Corollary  II  to  the  Sequential  Estiaation  Theorea  116 

Corollary  II 

Gaussian  Exaaple 117 

The  Sequential  Entropy  Theory  for  Feedback  Control  Systeas  ...  .124 

Introduction  124 

L®«»* 124 

The  Real  Tiae  Feedback  Entropy  Theorea 126 

Theoren  6.7 126 

A Generalized  Entropy  Approach  to  Feedback  Contiol  129 

Theorea  6.8 130 

Corollary  to  Theorea  6.7 131 

VII  CONTINUOUS  TIME  DATA  PROCESSING 133 

Discussion  of  the  Difficulties  in  Solving  Tiae 

Continuous  Systems  with  Entropy  133 

Exaaple 134 

Conclusions 138 

VIII  ENTROPY  ANALYSIS  OF  ADAPTIVE  CONTROL  140 

Introduction  140 

A Paraaeter  Estimation  Problem  , 141 

Entropy  Solution  of  the  Identification  Problem  144 

Theorem  8.3 144 

Conclusions 146 

IX  SUM4ARY  OF  RESULTS  AND  AREAS  FOR  FUTURE  SEARCH 147 

Suaaary  of  Results  .147 

Areas  for  Future  Research 149 


vii 


LIST  OF  ILLUSTRATIONS 


Figure  Page 

1.1  The  Estimation  Problem 12 

1.2  The  Feedback  Problem 12 

2.1  Interpretation  of  a Sensor  as  a Communication  Channel  44 

3.1  A Scalar  Feedback  Problem  S8 

3.2  A Typical  Graph  of  H(x)  vs.  a 58 

3.3  The  Estimation  Problem 61 

3.4  The  Estimation  Problem  with  Generalized  Sensor 71 

4.1  The  Disturbance  Rejecting  Feedback  Control  System  (Regulator)  ....  79 

5.1  The  Tracking  Problem  (Servomechanism)  92 

6.1  A Typical  Additive  Noise  Sensor 100 

6.2  The  Sequential  Estimation  Problem 104 

6.3  The  Real  Time  Estimation  Problem  with  Generalized  Sensor 113 

6.4  A Modified  Estimation  Problem 115 

6.5  A Recommended  Experiment  115 

6.6  The  Sequential  Feedback  Regulator 125 

7.1  An  Identification  System  .......  ........  145 


viii 


CHAPTER  ONE 
INTRODUCTION 

1.1  Historical  Survey 

Historically,  the  theory  of  informal : n began  with  Claude 
Shannon's  publication  of  "The  Mathematical  Theory  of  Conmun Leal ions" 
[1).  This  fundamental  work  is  so  profound  that  even  though  many 
authors  are  able  to  prove  the  Shannon  theorems  more  simply  [2,3], 
or  supply  proofs  where  none  were  given  [4,5,6],  or  merely  reinter- 
pret the  theory  in  a more  useful  manner  [7,8,9],  it  is  very  rare 
to  find  published  results  that  are  not  based  in  some  way  on  a 
remark  or  idea  generated  by  Shannon.  Because  the  theory  is  so  young, 
much  of  the  significant  published  research  has  been  directed  toward 
developing  and  understanding  the  mathematical  techniques  [10,11,12, 
13,14,15,16,17]  rather  than  extending  the  theory  to  other  appli- 
cations. However,  this  survey,  and  in  fact  this  entire  dissertation, 
will  be  primarily  interested  in  the  applications  of  this  theory 
which  by  now  appears  to  have  a very  rigorous  foundation. 

The  principles  of  present  day  information  theory  are  the  results 
of  attempts  to  solve  the  very  basic  problem  of  calculating  the  amount 
of  information  contained  in  various  randan,  but  not  very  simple, 
objects.  This  measure  of  the  information  content,  called  entropy, 
provides,  at  the  very  least,  a criteron  for  evaluating  any  encoding 
procedure  [18].  In  fact,  the  usual  units  of  entropy,  bits,  indicates 
the  preoccupation  of  the  present  theory  with  symbols  and  codes. 


For  the  most  part,  information  theory  has  remained  the  private 
preserve  of  the  coding  theorists,  and  it  is  perhaps  unfortunate 
that  the  effect  of  the  need  for  high  powered  space-age  coding 
analysis  has  been  to  diminish  the  true  universal  value  of  entropy 
as  a measure  of  information.  It  is  rather  apparent  that  there  has 
been  relatively  little  conclusive  work  applying  the  concepts  of 
entropy  to  other  problem  areas.  Nevertheless,  a few  valuable  appli- 
cations, generalizations,  and  interpretations  of  the  theory  have 
been  obtained,  with  the  most  useful  results  appearing  for: 

a.  The  Design  of  Experiments  [19,20].  DeGroot's  effort  [20] 
in  this  field  is  even  more  remarkable  than  is  first  apparent  because 
useful  results  regarding  the  sequential  design  of  experiments  nay 

be  obtained  frcm  a very'  general  axiomatic  definition  of  uncertainty. 

This  implies  that  for  other  areas  of  interest  (filtering,  controlling, 
etc.)  the  specific  measures  of  uncertainty  such  as  entropy  and 
variance  may  be  interchangeable. 

b.  One  Armed  Bandit  Problems  [21].  Kelly's  approach  to  a 
gambling  problem  [21]  is  probably  one  of  the  earliest  non-coding 
applications  of  entropy.  Since  gambling,  economics,  filtering,  etc., 
all  have  a common  probabilistic  basis  it  will  not  be  very  surprising 
if  derivations  such  as  those  made  by  Kelly,  will  be  made  in  other 
fields. 

c.  Hunan  Operator  Systems  [22].  Elkind's  paper  [22]  is  an 
attempt  to  apply  entropy  concepts  to  a type  of  system  (human  operator) 
that  is  not  easily  analyzed  using  existing  techniques.  This  work 
must  definitely  be  considered  a precursor  of  the  entropy  analysis  of 

2 

^ dtt 


! 


automatic  control  systems.  However  because  his  analysis  depends 
on  the  information  transmitted  by  the  closed  loop  system,  Elkind 
overlooks  two  important  facets  of  the  problem: 

i)  The  information  transmitted  by  the  operator 
as  an  individual  component  of  the  whole  system  is 
not  known. 

ii)  The  effect  of  the  limited  "channel  capacity" 
of  the  human  operator  on  the  total  system  perform- 
ance is  not  known. 

d.  Systems  with  Randan  Parameters  [23].  When  a system  has 
random  parameters  it  is  often  difficult  to  describe  the  system 
response.  The  work  of  Foy  [23]  is  an  attempt  to  bridge  this  gap 
through  the  use  of  entropy.  The  class  of  systems  investigated  may 
be  described  by  ordinary  linear  differential  equations.  The  input 
forcing  function  and  the  coefficients  of  the  differential  equation 
have  a known  mathematical  structure  but  contain  certain  parameters 
which  have  known  probability  distributions.  The  effect  of  the  para- 
meter variations  on  the  output  can  then  be  measured  through  the  use 
of  the  "instantaneous"  output  entropy  (i.-e.,  the  entropy  measure 
of  uncertainty  as  to  the  value  of  the  output  that  will  be  observed 
for  a given  value  of  the  independent  variable)  as  opposed  to  the 
use  of  the  ensemble  entropy  (i.e.,  the  entropy  uncertainty  as  to 
which  one  of  the  class  of  output  functions  will  be  observed)  for 
the  same  situation. 

Unfortunately  the  general  formula  for  the  entropy  of  the  out- 
put, that  is  in  this  dissertation,  is  mathematically  intractable  so 


r— 

it  is  necessary  for  Foy  to  derive  upper  bounds,  through  the  use  of  the 
properties  of  the  Gaussian  probability  density  function.  The  effect- 
iveness of  these  bounds  is  supported  by  an  experimental  program  on 
an  analog  computer.  Examples,  based  on  a useful  and  recognizable 
feedback  loop  system  are  presented  but  it  is  these  examples  that 
point  out  the  fundamental  weakness  of  the  report,  that  weakness  being 
that  the  evaluation  of  instantaneous  output  entropy  has  no  useful 
interpretation,  as  yet.  It  does  not  appear'  to  be  a meaningful  design 
criterion. 

'lire  reason  for  this  may  not  be  the  inappnopriateness  of  the 
entropy  measure  but  the  inconclusiveness  of  the  research.  This  incon- 
clusiveness may  yet  be  resolved  if  Toy's  work  can  be  reevaluated  in 
the  context  of  an  adaptive  control  system  analysis.  Certainly  the 
usefulness  of  any  adaptive  controller  must  be  measured  and  the  results 
of  current  research  clearly  indicates  the  effectiveness  of  entropy  for 
just  this  purpose.  Toy's  work  nay  well  prove  to  be  the  cornerstone 
of  a totally  new  approach  to  adaptation. 

Despite  these  examples  described  above,  for  the  most  part  it  has 
still  been  the  motivation  of  the  problems  of  communication,  i.e.,  the 
problem  of  transmitting  knowledge  from  point  to  point  that  has  attracted 
tire  attention  of  the  information  theorists  [24,25,26,27,28],  Reduced 
to  its  simplest  abstraction,  the  communication  problem  concerns  itself 
with;  (1)  a sigiral  (or  message ) drawn  in  some  fashion  from  a predefined 
signal  vocabulary,  (2)  a chaimel  for  transmitting  the  message,  and  (3) 
a receiver  for  deciding  the  content  of  the  received  signal.  By 


4 


assigning  a measure  to  the  information  content  of  a message,  infor- 
mation theory  provides  an  invaluable  tool  for  assessing  the  worth  of 
the  coding  procedure,  analyzing  the  performance  of  the  channel,  and 
describing  the  efficiency  of  the  receiver.  By  tracing  the  message 
content  frcm  point  to  point  in  the  communication  system  the  concept 
of  a system  channel  capacity  follows  directly  and  various  design 
approaches  evolve.  In  fact,  the  theory  of  channel  capacity  represents 
the  major  contribution  of  Shannon's  pioneering  effort  in  communication 
theory.  With  the  fundamental  channel  capacity  theorem  given  by 

C = B L0G2  [l  + £ bits, 

Shannon  [1]  showed  conclusively  how  it  was  possible  to  exchange  channel 
bandwidth  (B)  for  signal  power  (P)  in  order  to  maintain  constant 
channel  capacity  (C)  for  the  same  noise  power  (N). 

The  mathematical  description  of  the  channel  behavior  becomes  more 
obscure  when  the  signals  involved  are  continuous.  The  process  of 
passing  frcm  the  discrete  situation,  with  its  secure  foundation  of 
intuitively  acceptable  results,  to  the  continuous  case  is  extremely 
tortuous,  especially  in  view  of  the  spectre  of  the  ever-present  paradox 
of  a signal  carrying  infinite  information  in  zero  time  [29,30].  This 
was  a weak  spot  in  the  original  Shannon  work  and  to  avoid  his  ambigu- 
ities and  paradoxes  it  has  been  necessary  for  Gelfand  [31],  Pinsker 
[32],  and  Hyang  [33]  to  each  make  new  definitions.  Hyang’s  work  is 
probably  the  most  up-to-date  research  on  the  entropy  of  time  continuous 
processes  and  for  background  it  contains  a lengthy  and  important 


S 


discussion  of  the  necessity  for  carefully  defining  a continuous- 
time-process information  measure. 

Hyang  overcomes  the  most  pressing  difficulties  in  the  following 


manner. 

Let 

T 

f 

S. 

l 

- 1 s(t)*^(t)  dt 

0 

T 

i = 1 

Y: 

= / y(t)  Uj  (t)  dt 

j = 1 

0 

where 

y(t)  = s(t)  + nCt). 


s(t)  is  the  message  and  and  Uj  are  arbitrary  functions.  The  average 
infomation  in  y(t)  about  s(t)  is  defined  as 


I(y(t);s(t))  = sup  I(Y1>Y2,. . . »Ym>s]l,.  . . ,SK) 

where  the  supremum  is  taken  over  all  in,  K and  M. 

The  powerfulness  of  this  definition  can  be  seen  from  the  fact 
that  there  are  no  restrictions  on 


1.  The  process  spectrum 

2.  The  process  stationarity 

3.  Hie  observation  interval. 


It  is  therefore  not  surprising  to  learn  that  no  general  results 
have  been  obtained  using  this  definition,  so  that  Hyang  then  finds 
it  convenient  to  consider  only  the  class  of  Gaussian  randan  processes. 
For  this  class  of  problems  the  covariance  function  describes  completely 
the  process  probability  density  function,  so  that  after  seme  manipu- 


6 


L 


lation  it  can  be  shown  that  the  necessary  basis  functions  for 
the  information  quantity  are  the  eigenfunctions  found  from 

T T 

V/¥t,,Vu>  du  = o f RgCt.uJ^u)  du. 

0 u 

This  integral  equation  is  a fundamental  equation  in  mean  square  error 
analysis  and  in  that  context  it  has  led  to  the  formulation  of  the 
Wiener-Hopf  equation  [34]  for  linear  estimation  and  the  formulation  of 
polynomial  estimators  for  nonlinear  estimation  [35].  Of  course  these 
classical  theories  now  become  even  more  interesting  because  they  have 
been  derived  from  a purely  information  theoretic  point  of  view. 

Hyang’s  results  are  important  for  two  reasons-,  first  they  lead 
to  valid  results  in  the  almost  totally  uninvestigated  field  of  con- 
tinuous process  entropy  and  second,  they  indicate  to  even  the  casual 
observer  that  stronger  (stronger  than  Gaussian)  results  for  the  sol- 
ution of  estimation  problems  may  be  obtained  through  the  use  of 
entropy  analysis. 

The  very  extensive  and  quite  practically  oriented  examples 
chosen  for  demonstration  of  the  application  of  the  time  continuous 
theory  is  only  further  proof  that  information  theory  has  far  exceeded 
information  practice,  because  at  no  time  is  Hyang  able  to  present  a 
clear  motivation  for  wanting  to  calculate  continuous  time  entropy  in 
the  first  place.  In  general,  it  is  not  the  theoretical  limitations 
of  time  continuous  entropy  that  limits  the  usefulness  of  entropy 
techniques  for  describing  and  analyzing  objects.  Rather,  it  is  the 


1 





assumed  requirement  for  an  optimal  code  which  achieves  arbitrarily 
small  error  for  the  cost  of  faithful  reproduction  of  a message  must, 
in  general,  be  paid  for  with  infinite  time  delays. 

Such  considerations  and  restrictions  are  unheard  of  in  the  design 
of  feedback  control  systems.  Conventional  feedback  control  systems 
are  usually  continuous,  long  time  delays  are  intolerable,  and  very 
rarely  are  there  any  coding  considerations*.  However,  these  differ- 
ances  between  communication  systems  and  feedback  systems  do  not  auto- 
matically preclude  the  application  of  information  theory  to  control 
system  analysis.  Actually,  it  is  in  the  areas  of  degradation  of  per- 
formance due  to  noise  and  channel  capacity  that  communication  and 
feedback  systems  show  the  most  resemblances.  The  purpose  of  this 
dissertation  is  to  show  how  the  basic  theory  of  information  may  be 
extended  to  take  advantage  of  these  similarities  and  accommodate  the 
large  class  of  feedback  control  problems. 

If  new  results  are  to  be  obtained,  it  appears  that  it  will  be 
necessary  in  seme  way  to  rise  above  the  restrictions  implied  by  the 
coding  approach  to  information  theory  and  to  reinterpret  the  concept 
of  entropy  in  terms  generic  to  the  operation  of  a feedback  control 
system  as  opposed  to  the  operation  of  a communication  system.  For 
example,  by  abandoning  coding  concepts,  Weiner  [36]  is  able  to  use 
mutual  information  to  derive  filtering  equations  that  are  usually 

*When  a coiling  problem  does  arise,  such  as  in  a system  using  analog 
and  digital  components  in  a hybrid  configuration,  the  codiirg  is  not 
germane  to  the  feedback  problem. 


8 


arrived  at  by  the  conventional  methods  of  mean  square  error  analysis. 

In  the  field  of  statistical  analysis,  Gardner  and  McGill  [37]  are 
able  to  demonstrate  an  entropy  approach  to  the  partitioning  of  vari- 
ability, while  Chaitanya  Swarup  [38]  develops  an  informational  descrip- 
tion of  the  properties  of  estimators  and  hypothesis  tests.  Foy's  work, 
already  cited  [23],  is  still  another  example  of  the  non-coding  aspects 
of  entropy  being  applied  for  analysis  purposes. 

Krasovskiy  [35,40],  who  considers  a system  that  may  be  described 
by  a set  of  nonlinear  differential  equations  with  time  varying  non- 
random coefficients,  takes  a slightly  different  point  of  view  than 
that  taken  by  Foy  for  investigating  the  entropy  of  the  output  of  a 
dynamical  system.  When  the  unforced  dynamical  system  is  linear  and 
can  be  described  as 

“3E  * *iK  (t>  Xk  = ° (iSl*  2’  * * ’ ,N>* 

Then  he  shows  that  the  instantaneous  entropy  of  the  state  vector, 

X = (X;L,  x2,  ...»  xn) 
may  be  found  from 


A derivation  is  also  presented  by  Krasovskiy  which  allows  for  a random 
forcing  function.  Unfortunately  this  formulation  is  too  unwieldy  to 
be  of  use  for  anything  but  the  Gaussian  situation.  In  addition  to  this 
fault  the  paper  suffers  from  two  other  important  defects: 


1.  The  equations  of  motions  must  be  written  for  the  system  as 

a whole;  subsystems  can  not  be  examined  separately  and  then  connected 
output-to- input . If  the  result  of  the  entropy  analysis  is  to  be  used 
to  design  a system  component,  then  the  form  of  the  differential 
equation  description  of  that  component  must  be  known  down  to  the  last 
parameter. 

2.  Krasovskiy  presents  no  motivation  for  calculating  the  output 
entropy.  Examples  are  presented  and  this  quantity  is  determined  but 
there  is  no  justification  for  doing  so. 

Because  the  techniques  of  Foy  and  Krasovskiy  begin  with  the 
system  equations  of  motion,  both  are  general  enough  to  include  feed- 
back systems.  But  once  the  dynamical  equations  are  written,  it  is 
no  longer  apparent  whether  the  original  system  did  or  did  not  have  a 
feedback  path.  Certainly  it  should  be  expected  that  any  useful  appli- 
cation of  entropy  as  an  analysis  tool  should  maintain  the  distinction 
between  feedback  and  non-feedback  systems;  and,  in  some  cases,  even 
provide  greater  insight  into  the  basic  differences  between  them. 

The  ability  of  the  system  to  act  upon  a command  message  and  to 
control,  through  the  use  of  a feedback  mechanism,  an  external  object, 
most  clearly  distinguishes  the  feedback  system  from  the  communication 
system,  and  yet  this  has  always  been  the  most  overlooked  aspect  of 
the  problem.  Even  in  Weiner's  Cybernetics  [36],  which  appears  to  be 
the  earliest  attempt  to  make  use  of  entropy  as  a statistical  design 
property,  tnis  fundamental  distinction  between  the  control  problem 
and  the  filtering  problem  is  ignored.  Yet  surprisingly,  the  reward 


for  Insisting  that  the  control  system  equations  specifically 
emphasize  the  feedback  form  of  the  configuration  is  that  when  an 
entropy  analysis  is  made  the  resulting  parameters  that  describe  the 
system  performance  can  all  be  calculated  from  knowledge  of  the  open 
loop  behavior  of  the  system. 

The  development  and  proof  of  this  very  general  result  represents 
the  main  contribution  of  this  dissertation. 

1.2  General  Description  of  Problems  Considered  in  this  Dissertation 

Two  classes  of  problems  are  considered  in  this  dissertation.  They 


1.  The  estimation  problem  shown  in  block  diagram  form  in 
Figure  1.1,  and 

2.  the  feedback  problem*  shown  in  Figure  1.2. 

Even  though  they  represent  fundamentally  different  applications, 
it  is  nevertheless  true  that  both  problems  have  a great  deal  of 
similarity.  Both  utilize  noisy  measurements  in  order  to  obtain  a 
quantity  which,  when  subtracted  from  the  processed  signal,  produces 
an  error  which  is  minimum  in  the  sense  of  a predefined  criterion. 

In  fact  it  is  the  demonstration  of  the  similarity  of  the  beltavior  of 
the  noisy  sensor  in  both  types  of  systems  that  unifies  these  problems 
and  which  occupies  the  major  effort  of  the  analysis  contained  in  this 
report. 

•In  the  category  of  feedback  control  problems  is  included  both  noise 
rejecting  and  tracking  systems.  Where  it  is  appropriate  to  do  so, 
both  types  of  systems  will  be  referred  to  simply  as  feedback  systems. 


I 


NOISY 

SENSOR 


MEASUREMENTS 


DATA 

PROCESSOR 


ESTIMATE  OF  THE 
PROCESSED  SIGNAL 


Figure  1.1.  The  Estimation  Problem. 


All  the  variables  encountered  in  the  problems  in  this 
dissertation  are  assumed  to  be  sampled  in  time  [not  necessarily 
regularly],  and  to  be  continuous  in  an  arbitrary  but  predefined 
probability  space.  The  signal  is  taken  to  be  statistically  independent 
of  the  sensor  errors. 

Practical  experience  seems  to  indicate  that  the  performance  of 
these  systems  must  be  specifically  limited  by  the  behavior  of  the 
noisy  sensor  and  yet  conventional  applications  of  presently  accepted 
analysis  techniques  does  not  usually  lead  to  this  desired  inter- 
pretation. In  a very  heuristic  way  a verbal  description  of  the  signal 
flow  couched  in  the  information  theory  language  of  uncertainty  and 
channel  capacity  seems  to  provide  the  right  approach  toward  evaluating 
the  effect  of  the  sensor  on  the  performance  attainable  by  such  systems. 
However,  even  though  the  existance  of  a new  theory  of  analysis  based 
on  sensor  properties  is  suspected,  a great  deal  of  sophisticated 
manipulation  involving  entropy  expressions  is  required  in  order  to 
develop  this  theory,  as  will  be  seen  later. 

1.3  Brief  Sunmary  of  Contents 

Chapter  Two  develops  the  concept  of  entropy  into  a useful 
analysis  tool.  Beginning  with  a motivation  for  using  entropy  and 
related  concepts  to  analyze  control  problems,  it  continues  by  outlining 
seme  of  the  more  important  formal  properties  of  both  entropy  and 
mutual  information  and  how  these  quantities  change  when  acted  upon 
by  circuit  elements.  It  is  convenient  to  also  introduce  in  this 
chapter,  the  concepts  of  entropy  of  markoff  sources,  entropy  of  sum 


13 


signals,  and  the  transmittance  and  incremental  transmittance  of 
noisy  sensors.  Alone,  these  quantities  have  no  intrinsic  value, 
but  their  evaluation  leads  to  proficiency  with  the  entropy  criterion 
function  and  paves  the  way  for  the  useful  interpretation  of  the 
important  results  relating  to  analysis  of  control  systems  which  comes 
later.  The  channel  transmittance  of  the  sensor  which  is  the  infor- 
mation obtained  about  all  of  the  signal  samples  from  all  of  the 
measurements  must,  when  the  time  comes  to  use  the  concept,  be  the 
limiting  factor  on  the  ability  of  a measurement  path  to  improve 
" system  performance  using  either  feed-forward  or  feedback  signal 
processing.  Markovian  entropy  on  the  other  hand  demonstrates  that 
acquiring  new  measurements  Cor  alternately  saving  old  measurements) 
becomes  less  and  less  effective  for  reducing  the  signal  uncertainty. 
This  in  turn  leads  to  a proof  of  the  existence  of  steady-state  (non- 
perfect) performance  solutions  for  the  given  systems. 

Chapter  Three  initiates  the  effective  use  of  entropy  as  a tool 
for  the  analysis  of  systems  which  estimate  the  values  of  random  signals. 
The  fundamental  result  is  an  equation  relating  the  significant  infor- 
mational quantities  encountered  in  this  type  of  problem.  This  equation 
may  then  be  rearranged  to  demonstrate: 

1.  Gaussian  linear  mean  square  error  analysis  is  a special  case 
of  the  entropy  solution. 

2.  The  error  vector  entropy  for  the  system  is  bounded  by  a 
quantity  which  depends  on  the  channel  transmittance  of  the  sensor. 


14 


r 


I 


3.  The  coordinate  entropy  of  the  error  is  bounded  by  a quantity 
which  depends  on  the  incremental  channel  transmittance  of  the  sensor. 

These  results  bear  two  important  resemblances  to  the  pioneering  work 
of  Shannon.  The  first  similarity,  unfortunately,  is  a requirement 
that  optimum  system  performance  can  only  be  obtained  by  delaying  all 
data  processing  one  entire  message  length.  This  is  the  "infinite 
delay"  that  is  required  by  a Shannon  code  in  the  general  case,  in 
order  to  achieve  error  free  transmission.  For  some  types  of  estim- 
ation this  is  not  a handicap,  but  real  time  estimation  and  feedback 
systems  demand  lagless  data  processing  and  can  tolerate  no  delays. 

On  the  plus  side,  the  second  similarity  is  that,  just  as  with  the 
Shannon  procedure,  it  is  sufficient  to  prove  the  existence  of  a coding 
procedure  [in  the  case  of  estimation  to  prove  the  existence  of  a 
filtering  function]  and  evaluate  the  system  performance  as  if  the  code 
were  known.  Never  once,  in  the  course  of  realizing  the  benefits 
of  the  work  in  this  chapter  is  it  actually  necessary  to  know  the 
optimizing  filter. 

These  two  observations  of  the  theorem  of  Chapter  Three  hold  out 
a premise  of  deriving  a real  time  solution  without  sacrificing  the 
decidedly  important  advantage  of  a "disappearing"  filter  function. 

Before  this  aspect  of  estimation  is  investigated,  Chapters  Four  and 
Five  are  devoted  to  studying  the  feedback  control  problem  and  the 
tracking  problem  respectively.  The  results  for  oth  of  these  problems 
aie  very  similar  in  form  to  the  result  derived  for  the  estimation 
problem  and  are  subject  to  the  same  sort  of  interpretations.  Of  course, 
as  was  suspected  in  the  analysis  of  the  conclusions  to  the  theorem 


1 


15 


of  Chapter  Three,  the  bounds  derived  for  feedback  systems  are  too 
loose  and  cannot  be  achieved  through  the  use  of  physically  realizable 
components.  These  results  can  not,  therefore,  be  shown  to  be 
analogous  to  solutions  derived  by  the  application  of  Gaussian-linear 
mean  square  techniques.  The  important  conclusions  of  these  two 
chapters  are: 

1.  Entropy  analysis  can  lead  to  useful  results. 

2.  Sensor  channel  transmittance  is  always  a limiting  factor 
to  the  system  performance. 

3.  It  is  never  necessary  to  know  the  optimum  feedback  filter 
or  to  evaluate  the  closed  loop  gain  function,  in  order  to  determine 
the  bounds  on  system  entropy  performance. 

The  real  contribution  of  this  dissertation  is  made  in  Chapter  Six. 
Even  though  the  channel  approach  to  system  theory  is  an  original 
analysis  technique,  it  is  only  when  this  theory  is  applied  to  real 
time  data  processing  systems  that  significant  advances  in  the  under- 
standing of  feedback  control  systems  are  made.  Chapter  Six  begins 
by  deriving  a real  time  property  of  sensors  referred  to  as  "The 
Sequential  Channel  Transmittance."  This  quantity  relates  the  para- 
meters of  the  sensor  to  its  real  time  ability  to  provide  new  infor- 
mation about  a signal  when  it  is  used  in  a suitable  measuring  path. 

For  both  the  estimation  and  feedback  configurations  it  is  proved 

that: 


r— 

1.  Gaussian  linear  mean  square  analysis  of  real  time  sampled 
data  systems  is  a special  case  of  the  sequential  entropy  solution. 

2.  The  entropy  of  the  error  is  bounded  by  a quantity  dependent 
on  the  sequential  channel  transmittance  of  the  sensor. 

3.  All  performance  quantities  can  be  determined  without  actually 
solving  the  optimum  entropy  problem  for  the  solution  filter. 

Chapter  Seven  discusses  the  extension  of  the  theories  to 
continuous  time  systems.  Based  on  an  axiomatic  description  of 
uncertainty  and  mutual  information,  there  is  no  reason  to  expect 
that  all  the  results  of  the  first  six  chapters  do  not  carry  over  in 
total  to  the  continuous  time  problem.  Unfortunately , at  the  present 
time  the  theory  of  continuous  entropy  is  not  sufficiently  developed 
to  overcome  the  onus  (or  at  least  seemingly  unexplainable  fact)  of 
infinite  signal  uncertainty.  It  is  not  known  whether  this  limitation 
is  due  to  the  lack  of  experience  on  the  part  of  researchers  in  under- 
standing continuous  entropy,  or  due  to  the  need  for  an  entirely  new 
defining  equations  for  entropy  such  as  one  based  on  integration  in 
function  space.  In  either  case  the  example  presented  in  this  chapter 
does  provide  hope  for  the  eventual  development  of  satisfactory 
continuous  time  entropy  analysis. 

A very  elementary  approach  to  the  problems  of  adaptive  control  is 
considered  in  Chapter  Eight.  Considering  only  one  part  of  the  adaptive 
process,  the  identification  of  unknown  system  parameters,  an  example 
and  a theorem  are  presented  which  again  demonstrate  the  importance 
of  the  sensor  channel  transmittance.  While  this  is  a new  and 


important  result  for  the  theory  of  adaptation  it  still  remains  to 
consider  the  entire  adaptive  process  and  to  prove  that  the  rate  of 
improvement  of  the  system  performance  is  a function  of  the  sensor 
channel  properties. 

Chapter  Nine  surrmarizes  the  principal  results  of  the  dissertation, 
discusses  the  limitations  and  disadvantages  of  the  procedures  developed 
and  indicates  the  areas  which  require  further  investigation. 


pan 


CHAPTER  WO 

DEVELOPMENT  OF  ENTROPY  AS  AN  ANALYSIS  TOOL 
2.1  Why  Entropy? 

Entropy  (or  uncertainty  as  it  is  often  called)  has  found  wide 
spread  applications  among  comnunication  engineers.  When  used  to 
describe  a process  having  a countable  number  of  states,  it  has  a very 
definite  interpretation  which  leads  from  information  content  (bits) 
to  coding  length  (bits/code  symbol)  to  channel  capacity  (maximum  bits/ 
second)  and  then  to  practical  system  design.  But  its  use  is  defin- 
itely limited  when  applied  to  continuous  processes  (even  processes 
continuous  in  state  space  but  discrete  in  time).  Continuous  signal 
entropy  may  be  negative,  mutual  information  can  be  infinite  and 
even  simple  linear  transformations  are  no  longer  entropy  invariant. 

But  these  are  not  actually  limitations  on  entropy  as  a system  cri- 
terion function  but  more  on  entropy  as  an  intuitively  understandable 
concept.  It  is  when  entropy  is  treated  as  a cost  function  and  not 
as  measure  of  content  of  communication  signals  that  it  finds  usage 
in  the  analysis  of  control  systems. 

The  entropy  H(x)  of  a scalar  variable  x,  having  a probability 

density  function,  p^(x)  is  defined  as: 

00 

H(x)  = J da  px(a)  log  . 

In  simple  terms  this  function  measures  the  "spread"  of  the  randan 
variable  x.  A variable  whose  pdf  is  more  concentrated  than  another' 
will  have  less  entropy  than  the  other  variable.  As  a criterion. 


19 


entropy  is  analogous  to  the  mean  square  error  which  measures  the 
second  order  (the  variance)  spread  of  a variable.  In  fact,  for 
Gaussian  random  variables  there  is  a one-to-one  relationship  between 
variance  and  entropy  so  that  when  used  as  a criterion  for  system 
design  minimun  mean  square  error  must  always  be  equivalent  to  min- 
imum entropy. 


There  are  several  other  important  probability  density  functions 
for  which  minimum  entropy  is  equivalent  to  minimum  mean  square 
error.  For  example: 


1.  The  Rectangular  Distribution 


P(x)  = 

E{x}  = ^ 
VAR{x}= 


a < x < b 


= o 


H(x)  = LN  (b-a)  = LM  o + y LM  12 
2.  The  Exponential  Distribution 

p(x)  = ae-ax  x > 0 


E{x)  = i 
a 

VAR{x)  = — = o2 
a 

H(x)  = LN  - = LN  (co2) 
a 

This  benign  relationship  between  the  variance  and  the  entropy  fore- 
tells great  possibilities  for  the  use  of  entropy  as  a tool  for 
system  performance  analysis. 

This  relationship  is,  of  course,  not  satisfied  in  general,  so 
that  minimun  entropy  designed  systems  are  not  always  identical  to 


20 


minimum  mean  square  designed  systems.  However,  it  is  not  the  use 
of  entropy  as  a design  tool  that  is  considered  in  this  report.  Rather 
it  is  the  use  of  entropy  as  a design  criterion  so  as  to  set  bounds  on 
possible  system  performance  and  the  inequalities  relating  entropy  to 
variance  that  are  considered.  Hie  ability  to  be  able  to  write  entropy 
expressions  without  first  having  to  design  the  optimum  system,  dis- 
tinguishes entropy  analysis  from  conventional  second  order  consider- 
ations . 

The  remainder  of  this  chapter  will  be  devoted  to  developing  the 
basic  properties  of  entropy  and  mutual  information  as  they  apply  to 
systems  analysis.  Important  theorems  relating  to: 

1.  The  entropy  of  markoff  sources, 

2.  the  entropy  of  signals,  conditioned  on  measurements, 

3.  the  channel  transmittance  of  a sensor,  and, 

4.  the  incremental  channel  transmittance  of  a sensor, 

are  presented  in  anticipation  of  a need  for  these  results  in  the  body 
of  the  dissertation.  It  is  interesting  to  notice  that  the  properties 
of  entropy  are  in  exact  agreement  with  how  intuition  says  an  infor- 
mation measuring  quantity  should  behave. 

For  the  reader  already  familiar  with  the  concepts  of  information 
theory,  the  results  of  the  next  two  sections  are  sumrarized  in  Table  I. 

2.2  Properties  of  Entropy 

The  entropy  of  a K dimensional  vector  random  variable  having  a 
probability  density  function  that  is  continuous  in  all  the  components 
is  defined  lay  Shannon  [1]  on  page  54  as: 


21 


TABLE  I 

Properties  of  Entropy  and  Mutual  Information 
1.  H(X)  - - dx^  dx^ . . . dx^  Px(x1,x2,...,xK) 
LOG  px(x^  * * * * »XK^ 

= -_y^dX  px(X)  LOG  px(X) 


2.  H(Y)  = H(X) 


X)  +y^  dX 


Px(X)  LOG  | DET 


3f . (X) 


where  Y = F(X) 


3.  HCX/Y)  $ ^ dX  J^~dY  p(X,Y)  LOG  p^yy 

4.  H(X,Y)  = ^dX /dY  p(X,Y>  LOG  p^?y 

5.  H(Y,X)  = H(X,Y) 

6.  H(X,Y)  = H(X)  + H(Y/X) 

7.  H(X,Y)  = H(Y)  ♦ H(X/Y) 

8.  h(y)  - H(y^  ty^ » • • • • • »y2»y^ 

9*.  H(X,Y)  <_  H(X)  ♦ H(Y) 

10*.  H(X/Y)  < H(X) 


11*.  H(Y/X)  <_  h(Y) 

12.  H(Z/Y,X)  < H(Z/Y) 

7"  ”>  PCX.Y) 

13.  I(X;Y)  = / dX  J dY  p(X,Y)  LOG  (Xfr  7y) 

14*.  I(X;Y)  = I(Y,X)  >_  0 

15.  I(X;Y)  = H(X)  ♦ H(Y)  - H(X,Y) 

16.  I (X * Y)  = H(X)  - H(X/Y)  = H(Y)  - H(Y/X) 

17.  I(X;Y,Z)  > I(X,Y) 

*The  equality  holds  if  X and  Y are  independent  random  variables. 


22  * 


H(X)  * - J dx. . . . j Px 

“*  _oo  1 'J-m 

CD 

= - / dX  p (X)  LOG  p (X)  (2.2.1) 

J ■“  X • * ^ 

—CD 

where  the  vector  notation  X^  - column  (x^x^,.  • • >x^}  is  used.  In  all 
cases  in  this  dissertation  capital  letters  are  used  to  denote  the 
vector  and  lower  case  letters  the  vector  components. 

From  the  ccumunications  point  of  view  continuous  entropy  has 
three  "disadvantages"  when  compared  to  the  entropy  of  variables 
having  discrete  probability  densities: 

1.  Continuous  entropy  is  not  always  non-negative. 

2.  Continuous  entropy  is  not  always  finite. 

3.  Continuous  entropy  is  not  always  invariant  under  linear 

transformations . 

These  properties  clearly  handicap  the  interpretation  of  entropy 
as  a measure  of  uncertainty.  For  control  system  applications,  how- 
ever, there  is  no  obligation  to  view  entropy  as  anything  more  than  a 
suitable  criterion  function.  As  an  example  of  the  variability  of 
entropy  under  coordinate  transformations  examine  the  vector  random 
variables,  X,  and  Y , where 

Y = F (X) 

or 

yi  - f^XpXj,* « • 


If  F(X)  is  continuous  and  one-to-one,  the  two  probability  density 
functions  are  related  by  (see  Parzen  [41],  page  329-331): 


p (X)  = p (F(X) ) J 


rx  - *y 

where  J is  the  Jacobian  of  the  transformation  and  it  is  defined  as 


J = 


DET 


3f, 


3f, 


• • • • axK 


!5< 

3x, 


3XK 


DET 


af^x) 

IxT 

3 


The  entropy  of  X is  then  given  by 
00 

H(X)  = - / dX  p (X)  LOG  p (X) 

~ x - Fx  - 

00 

= - J dX  p (F(X)  ) J [LOG  (py(F(X))  + LOG  (J)] 

— oo  ^ 

00  0° 

= - J dY  py(Y)  LOG  (py(Y) ) - J dX  px(X)  LOG  (J) 

— OO 

or  finally 


H(Y)  = H(X)  + f dX  px  (X)  LOG  (J) 


(2.2.2) 


This  result  is  due  to  Shannon  [1],  page  57,  and  is  listed  in  Tab1  ; 
I,  as  property  2.  If  F(X)  is  the  linear  transformation 
y = f(x)  = ax  + b,  with  J = a, 

then  the  entropy  of  y is 

H(y)  = H(x)  + LOG  |a| 


24 


Thus,  depending  on  the  value  of  "a,"  the  entropy  of  y can  either 
be  greater  or  less  than  the  entropy  of  x.  H(y)  = H(x)  if  and  only 
if  a = 1.  The  whole  question  of  the  entropy  of  transformed  vectors 
is  so  important  that  it  will  be  studied  again  in  Section  2.4. 

Associated  with  the  pair  of  vectors  X and  Y,  which  possess 
continuous  marginal  densities  and  a continuous  joint  probability 
density  function,  are  the  conditional  entropies  H(X/Y) , H(Y/X)  and 
the  joint  entropy  H(X,Y).  These  quantities  are  defined  as: 


f T Py(I) 

H(X/Y)  = J J dX  dY  p(X,  Y)  LOG  r^yy 

—OO—OO  ■“  *■“ 

« 00  / \f\ 

f f Px<*> 

H(Y/X)  = JJ  dX  dY  p(X,  Y)  LOG 

00 

H(x,  Y)  = J dX  dY  p(X,  Y)  LOG 

—00  “ 

H(Y,  X)  = H (X,  Y) 


(2.2.3) 

(2.2.4) 

(2.2.5) 

(2.2.6) 


The  following  equations  relate  the  entropy  expressions: 

H(X,  Y)  = H(X)  + H(Y/X)  (2.2.7) 

= H(Y)  + H(X/Y)  . (2.2.8) 

When  X and  Y are  independent  random  variables  the  joint  probability 
density  function  is  simply 

p(X,  Y)  = px(X)  Py(Y) , 
and  then 

H(Y/X)  = H(Y)  (2.2.9) 

H(X/Y)  = H(X)  (2.2.10) 

H(X,  Y)  = H(Y)  + H(X)  . (2.2.11) 


25 


u 


The  condit ional  entropy  provides  a simple  way  for  expressing 
the  entropy  of  a K dimensional  vector  as  the  sum  of  entropies  of 
one  dimensional  entropies.  First  note  that 

pCx^jXj,. . . ,xK)=p(xK/xK_1, . . . ,x1)p(xK_1/xK_2,. . . .x^. . .p<x2/x1)p(x1). 
Then 

H(X)  - H(Xj^/x^_^, . . . jXj ,x^)  + ^(Xj^_^/x^_2 , . . . ,x2 ,x^ ) 

+ ...  + H(x2/x1)  + H(x1)  . (2.2.12) 

When  the  {x^}  form  a order  stationary  markoff  chain  the  con- 
ditional probability  densities  can  be  simplified  using: 

P<XK/XK-1’XK-2»**,,X2’X1)  = p(xK/xK-l,,,,»xK-M-l)  K-M  * 

For  such  a situation,  HCx^x^,. . . ^ = Wx^x^, . . . ,xK_M_1>  and 

H(X)  = H(xk/xk_1> . . . ,xK1_M)  + H(xk_1/xk_2 .... ,xK_2_M> 

^ ^ HCx^  jX^  % • • • 

(K-M)  H(Xjlij+^/XjWj, . . . ,x^)  + H(x^,x2>.  . . ,x^)  , (2.2.13) 

where  the  stationary  property  of  the  random  sequence,  x^,  x2,  x^, 
x4»...»  XK»  is  used  to  write  this  last  equation. 

For  first  order  markoff  components  the  total  entropy  of  the 
vector  X is 

*1 

H<X)  = (K-l)  H(x2/x1)  + H(x^)  . (2.2.14) 

Another  useful  property  of  the  entropy  of  first  order  markoff 
process  is  that 


H(*(xk)/xk-l»  xk-2)  s H(*(xk)/xk-l)  (2.2.15) 


where 


yk  = *(xk)  k = 1*  2»  # * * 

is  an  arbitrary  single  valued  function  of  x,  having  a continuous 
(and  non-zero)  derivative. 

If  P1(xk»xk_1»xk_2>  is  the  joint  probability  density  function 

of  Xj^,  xK_1,  and  xK_2>  and  if  P2(yK»xK-l»xk-2)  is  the  ^oint 
bility  density  function  for  yK,  and  x^_2 , it  is  true  that 

p2CyK*XK-l,XK-2)  = P2(*(xk)’xK-1’XK-2) 

d*(xK>  -1 

= P1^XK»XK_1>XK_2)  dxK 


d*(xK) 

P2(yK/xK_i»xK-2)  = pl(xK/xK-l)  dx 
p.Cx^.x,,,)  dKx,,)  -1 

* pG^"T  ST-  = P2!yK/*K-l)  = P2(«(V/*K-1) 

and  equation  (2.2.15)  follows  directly. 

When  the  components  of  a vector  are  Gaussian  random  variables 
the  probability  distribution  of  the  vector  can  be  written  as: 


1 T 

1 - 7 Ac  * 

p(X)  = ^r7=  e 

(2* r/ZVDET  Rx 


(2.2.16) 


where  Rx  is  the  covariance  matrix  with  entries,  = E(x^,Xj}.  The 

total  entropy  of  the  vector  X is  then 
00 

H(X)  = f dXp(X)  j LOG  (2it)K  DET  Ry  + i X^'1  X j 


= i LOG  (2*)Ke  + j LOG  C DET  Rx]  . 


(2.2.17) 


I 


For  two  one-dimensional  random  variables,  Shannon  [1],  has 
shown  that  for  the  same  variance,  the  random  variable  with  a 
Gaussian  distribution  always  has  a greater  entropy  than  the  random 
variable  with  any  other  distribution.  This  conclusion  leads 
directly  to  the  important  inequality: 


VAR  lx)  ~ LOG  2H(x)  (2.2.18) 

where  x has  an  arbitrary  probability  density  function.  This 
inequality  provides  the  bridge  between  conventional  second  order 
analysis  and  entropy  analysis. 

A property  of  entropy  which  is  extremely  valuable  is  that  as 
more  data  becomes  available  about  the  primary  random  variable  the 
entropy  of  that  random  variable  decreases.  The  proof  of  this 
statement  concerning  the  conditional  entropy  of  Z is  embodied  in 
the  following  theorem. 

Theorem  2.2.1: 

The  entropy  of  Z decreases  as  it  is  conditioned  on  more  data, 


Now  use  the  inequality 


UJ  - > 1-a 
a — 


to  get 


W OD  W 

H(Z/Y)  - H(Z/Y,X)  _>  J dX  J dY  J dZ  p(X,Y,Z) 


1 - 


p(Y,Z)  p(Y,X) 
p'iX,V,7)  pfY) 


p(Y,Z)  p(Y,X) 


r r r P\L»£>  P 

> i-y  dxy  dz_ ^ 


= i 


p(Y)  p(Y) 
p(Y) 


=1-1=0 


H(Z/Y)  - HCZ/Y.X)  >0  Q.E.D. 

This  theorem  is  an  obvious  extension  of  a similar  theorem  given 
by  Shannon  [1]  for  the  scalar  case,  i.e.,  by  Shannon: 

H(z)  >_  HU/y) 

The  technique  for  the  proof  is  an  accepted  procedure  for  proving 
entropy  inequalities  and  has  been  most  effectively  used  by  Feinstein  [2] 
and  Abramson  [7], 

Since  much  of  the  solution  to  the  estimation  and  feedback  con- 
trol problems  involves  calculating  the  uncertainty  (e.g.,  entropy) 
of  the  signal  sample  given  the  measurements  made  of  the  signal  vector, 
it  is  important  to  consider  further  the  properties  of  the  entropy 
of  conditioned  signals.  However,  even  before  the  actual  need  for 
this  function  arises  (it  first  appears  in  3.7)  it  is  obvious  that 
it  must  be  part  of  any  effort  to  determine  the  effectiveness  of  a 


29 


procedure  to  predict  a signal  based  on  noisy  measurements  of  that 
signal. 

It  is  expected  that  the  uncertainty  of  the  signal  will  decrease 
as  more  measurements  are  made.  However,  it  is  not  expected  that 
uncertainty  is  likely  to  approach  certainty  as  the  number  of  meas- 
urements approaches  infinity.  While  this  is  a minor  point,  it 
spells  the  difference  between  understanding  the  steady-state  behav- 
ior of  a system  or  being  completely  ignorant  of  any  limiting  factors 
to  its  performance.  The  important  attributes^f  the  conditional 
entropy  function,  are  proven  in  the  following  theorem.  The  inspir- 
ation for  this  theorem  and  the  procedure  are  due  to  Birch  [15]  but 
the  statement,  proof  and  interpretation  are  original. 

Theorem  2.2.2: 

For  stationary  processes  the  conditional  entropy  of  the  scalar 
randan  variable  yK>  given  the  last  K values  of  the  noise  corrupted 
signal,  is  a monotonically  decreasing  function  of  K and  has  a finite 
limit,  i.e., 

hK  = myK/Z^  > Wy^/Z^)  = hK+1  (2.2.20) 

and 

tim  HCy^/Z^)  = HCy^Z^)  >-«  (2.2.21) 

Proof : 

The  monotonic  behavior  of  the  conditional  entropy  follows 
directly  from  the  stationarity  of  the  processes  and  the  previous 
theorem  (2.2.1). 


30 


' • • - .c- 


H(>rK/^K)  = H(yK+l/zK+l’ZK’*  * * ’Z2) 

- H^yK+l^ZK+l’"  ’ ,Z2,Z1^  = H^yK+l^K+l* 


Thus,  the  entropy  of  y^,  conditioned  on  all  the  past  measurements 
is  a decreasing  function  of  K,  where  is  the  vector  whose  com- 
ponents, = y^  + n^,  are  the  K noise  measurements  made  of  y^, 
k = 1,  2,  . . . , X. 

It  is  inconceivable  that  in  ordinary  physical  situations 
H(yx/Z^)  = -",  for  any  finite  K*.  Singular  probability  distri- 
butions can  be  imagined  where  a finite  number  of  measurements  can 
be  used  to  predict  the  signal  with  probability  one  [entropy  of  -»], 
but  such  situations  are  of  no  real  concern.  The  real  concern  is 
to  be  able  to  bound  H(y^/Z^)  away  from  -°°.  Obviously  additional 
conditions  must  be  imposed  to  insure  that  with  even  an  infinite 
number  of ' measurements  the  conditional  uncertainty  of  the  signal 
is  not  Consider  first  the  situation  when  y is  first  order 

markoff,  the  extension  to  M"*1*1  order  processes  will  be  obvious. 

It  is  convenient  at  this  time  to  introduce  another  sequence 
{g^}  defined  by: 

gK+l  = H(yK+l/^K+l,yl)  * 

Because  of  the  process  stationarity  it  is  also  true  that 


gK+l  = H(yK/zK,zK-l,*,,,Zo,yo) 


*An  entropy  of  -«  implies  a completely  certain  continuous  random 
variable . 


31 


The  sequence  (gK)  is  now  shown  to  be  a monotonically  increasing 
function  of  K 

gKU  = H(yK+l/Zl,Z2,,*’,ZK+l’yl) 

= ^yK+l^Zl’Z2  * * * * ,ZK+l,yl’nl^  * 

where  the  rearrangement  of  the  pair  (z  ,3^)  into  (z^y^n^)  through 
the  use  of  z^  = y^  + n^,  does  not  change  the  conditional  entropy  of 
yK+1«  Since  the  {y^}  and 
sequences,  the  result  of  equation  (2.2.15)  implies  that  the  intro- 
duction of  yQ  and  nQ  into  the  expression  for  gK+1  will  not  change 
the  forward  conditional  probabilities,  i.e., 

gK+l  = H(yK+l/zl,Z2’’",ZK+l*yl’nl’yo*no> 

= H(yK+l/Zo*Zl’***’ZK+l’yl'yo) 

- H^yK+l/zo’***,2K+l,yo)  ~ gK+2* 

thus  proving  that  g^  is  a monotonically  increasing  function  of  K.  Now, 
the  following  inequality  describes  the  relationship  between  the 
sequences  {h^}  and  {g^} . For  all  k>_< 

gK<%iV.V 

Obviously  both  sequences  are  bounded  and  monotonic  so  that  the  follow- 
ing limits  exist: 

Him  hi=  K 
K-*»  K 

tim  gK  = g 
K-*“>  *' 

and 

gK^l^lhK  ^ K * 


In^}  are  taken  as  first  order  markoff 


32 


It  follows  that  tim  H (y^/Z^)  is  bounded  away  from  if 
gK  is  not  -•  for  same  finite  K.  Examine  g 2< 

g2-  = H(y2/z1,z?,y1), 

when  the  Z process  is  formed  by  adding  independent  random  variables, 

X«6«  } 

= yk  + \ k * 0*  1*  **’*  K 

it  is  not  possible  that 


8^ 


and  the  theorem  is  proven. 

2.3  Mutual  Information 

Communication  theory  lias  given  rise  to  a quantity,  which 
because  of  its  properties,  is  even  more  valuable  than  entropy. 
This  quantity  is  "mutual  information."  Technically,  the  mutual 
information,  I(X;Y>,  between  the  twn  random  v».rt'rs  X ano  Y is 

I(X;Y)  = I(Y;X)  = H(X)  - H(X/Y) 

p(X)p(Y) 


fr  p(X)p(Y) 

8 -JJ  <«  dX  P«.I>  u>G  • 


(2.3.1) 


Unless  otherwise  noted,  both  X and  Y are  usually  taken  as  K dimen- 
sional vectors. 

If  H(X)  is  the  a priori  entropy  of  X and  H(X/Y)  is  the  entropy 
of  X after  observing  Y,  then  I(X;Y)  is  the  average  amount  of  entropy 
supplied  by  Y.  However,  it  is  often  convenient  to  use  the  intuitive 
notion  that  I(X,Y)  is  the  average  amount  of  "information"  obtained 
about  X by  being  given  the  value  of  Y. 


\L 


33 


Mutual  information  Cor  trans information)  is 


1.  symmetrical  in  X and  Y, 

2.  non-negative  (I(X;Y)  _>  0), 

3.  generally  finite, 

4.  invariant  under  linear  transformations. 

In  addition,  the  following  relationships  are  satisfied: 

1.  I(X;Y)  = 0 (if  X and  Y are  independent) 


2.  I(X;Y)  = H(X)  + HCY)  - H(X,Y)  (2.3.2) 

3.  I(X;Y)  = H(X)  - H(X,Y)  (2.3.3) 

4.  I(X;Y)  = H(Y)  - H(Y/X)  (2.3.4) 

Using  I(X;Y)  >_  0 in  equations  (2,.3.^),  (2.3.3)  and  (2.3.4),  yields 

H(X)  +H(Y)  >H(X,Y)  ' (2.3.2a) 

H(X)>H(X/Y)  (2.3.3a) 

H(Y)>H(Y/X)  (2.3.4a) 


With  mutual  information  just  inversely  as  with  entropy,  acquiring 
more  measurements  increases  mutual  information  montonically . This 
is  stated  as: 

Theorem  2.3.1: 

If  the  information  about  X given  Y is  I(X;Y),  then  the  infor- 
mation about  X given  Y and  Z must  be  greater,  i.e. , 

I(X;Y)  _<  KX;Y,Z)  (2.3.5) 

where  the  coordinates  of  Z are  additional  coordinates  of  the  Y 
vector  (or  are  other  observations). 

Proof: 

The  statement  for  this  theorem  is  given  by  Gel'Fand  [31],  but 
his  proof  is  much  too  sophisticated  for  the  analysis  of  sample  data 


34 


signals  so  a simpler  one  is  supplied  here.  Using  the  defin- 
ition of  mutual  information,  I(X;Y,£)  is 


1 


I(X;Y,Z) 


OD  00  00 

- J dX  f dY  f dZ  p(X,Y,Z)  LOG 


p(X)p(Y,Z) 
' p(X,YT75 


00  OD  00 

= -/  dX  J"  dY  f dZ  p(X,Y,Z)  LOG 


p(X)p(Y)p(Z/Y) 

p(x,Y)P(z/k,y) 


KX;Y,Z)  = I(X;Y)  + HCZ/Y)  - H(Z/X,Y) 
and  finally,  using  theorem  2.2.1,  this  equation  becomes 

KZ;Y,Z)  >_  I(X;Y).  Q.E.D. 

2.4  Coordinate  Transformations 

Much  of  the  work  of  this  dissertation  depends  on  considering 
the  entropy  (or  mutual  information)  of  transformed  variables.  It 
is  therefore  convenient  for  the  initial  developments  to  restrict 
the  derivations  to  the  class  of  information  preserving  trans- 
formations . 

If  the  mutual  information  between  a pair  of  random  variables 
is  invariant  under  a transformation  of  those  variables  it  is  not 
intuitively  obvious  that  the  given  transformation  must  be  one-to- 
one,  or  that  the  inverse  transformation  exists  so  that  the  input 
vector  can  always  be  "recovered"  if  the  output  vector  is  known. 
When  studying  random  variables  with  continuous  (or  at  least  piece- 
wise  continuous)  probability  density  functions  the  constraint  on 
the  transformation  is  made  slightly  more  severe  than  merely  being 


information  preserving,  so  as  to  include  the  "recovery"  property. 

The  following  definition  will  accomplish  this. 

Definition: 

• • • f 

An  information  preserving  transformation  from  x — -^y  is  one 
for  which  the  equation  defining  the  transformation  of  the  probability 
density  functions  [41], 

Px(x)  = Py(f(x))  [g  (X)|,  C2.4.1) 

is  defined  and  is  valid  for  all  values  of  x. 

It  is  quite  easy  to  show  that  when  f(x)  satisfies  the  con- 
ditions of  equation  (2.4.1)  it  is  a regular  transformation,  i.e., 
both  f(x)  and  f-1(y)  are  of  class  and  I(x;z)  = I(f(x) ;z)  = 
l(y  ;Z). 

Alternately  an  equivalent  constraint  on  f(x)  is  that  the 
Jacobian  of  the  transformation  is  continuous  and  non-zero.  In  any 
case,  the  important  fact  is  that  when  the  conditions  on  f(x)  are 
defined,  in  either  manner,  f(x)  has  an  inverse  and  no  information 
about  the  input  is  lost  because  of  the  transformation. 

Actually,  the  true  class  of  information  invariant  transforma- 
tions is  much  more  inclusive  than  the  class  of  functions  defined 
by  the  use  of  equation  (2.4.1).  This  is  because  the  broader  class 
of  transformations  also  includes  functions , F( • ) , that  while  not 
having  a unique  inverse,  do  possess  the  property  that  for  all  sets 
A and  B,  where  B = F~^(A)  it  is  true  that 

p(z/yeA)  = p(z/x^eB)  = p(z/x2eA)  -Vx^x^B  . (2.4.2) 


36 


Since  I(ziy)  = H(z)  - H(z/y),  it  must  follow  that  for  any  trans- 
formation 


y = f(x) 

satisfying  equation  (2.4.2) 

I(z;x)  = I(z;f(x))  = ,.(z;y) . 

As  an  example  of  this  type  of  function,  consider 
2 

y = x 

together  with 

p(z/x)  = 0 x<_  0. 

Now  let  R be  the  continuous  real  line  and  A and  B be  the  positive 
half  line  (including  zero).  Then  obviously 
p(z/yeA)  = p(z/xcB) 

and  therefore,  even  though  f ^(y)  is  not  unique, 

I(z;x)  = I(z;y). 

This  is  a "contrived"  example  and  it  is  not  likely  that  transform- 
ations matching  themselves  so  closely  to  the  conditional  density 
p(z/x)  so  as  to  satisfy  equation  (2.4.2)  will  ever  occur  in  practical 
situations. 

2.5  Entropy  of  Markoff  Sources 

In  general,  most  of  the  random  processes  encountered  in  control 
systems  analysis  are  narkoff  sources  which  may  be  considered  as 
being  generated  by  the  simple  Mth  order  differencing  process  given 
by: 


37 


yl  s *1 

V2  = a21yl  + b21Cl  + *2 


yk 


yK 


+ 


+ 


M 


i=K-M 


bKiCi 


k>m 


(2.5.1) 


where  the  driving  function  {^}  is  a sequence  of  independent 
random  variables  chosen  with  the  necessary  probability  density 
functions  to  achieve  the  desired  density  for  the  y sequence.  It 
is  not  necessary  that  the  £ process  be  stationary  or  that  the 
differencing  be  time  invariant.  However,  later  on  such  simpli- 
fications will  be  introduced  in  order  to  study  the  much  more 
conmon  stationary  markoff  process. 

For  convenience  of  notation  it  is  desirable  to  make  the 
following  definitions: 


38 


- - — — ■■ 


1 


0 

0 


A is  a lower  triangular  matrix,  i.e.,  it  is  the  matrix  of  a phys- 
ically realizable  process,  and  its  determinant  is  1.  Using  a 
vector  algebra  the  generation  of  {y^}  can  be  written  simple  as: 

Y = AC  (2.5.2) 

The  entropy  of  the  Y vector  can  now  be  determined  directly 
from  the  entropy  of  the  generating  sequence  £ using  the-  relation- 
ship 

Hk(Y)  = H^C)  + LOG  [DET  (A)]  = (£) 

where  the  subscript  K is  used  to  denote  that  the  dimension  of  the 
vector  whose  entropy  is  being  described  is  K. 

Since  C_  is  composed  of  independent  random  variables  it  follows 
directly  that 

"K  «> 2 £ W • 


(2.5.3) 


Now  if  an  additional  value  of  y,  namely  yK+1»  is  generated 


I\Ti  IN 

«K+1  «>  * |1  W - ^ W * 


* W * VW1 


= + Hl(yK+l/yK’yK-l*' ‘^l^ 


(2.5.4) 


(2.5.5) 


Therefore,  these  last  tw  expressions  together  with  the  markoff 


property  of  the  y process  yields 

Hl(yK+l/yK’,,,,yK-M)  = Hlt^K+l) 


(2.5.6) 


When  written  in  terns  of  the  probability  density  functions 


this  equation  is: 


T P^yK* * * ‘ *yK-M^ 

J dyK+lp(yK+l’yK’yK-l*  ’ ’yK-M}  106  p(yK+1,yK,...,y: 

.00 

00 

= Jdi PC(^K+1)  LOG  PC(C^+1)  * 


In  terms  of  the  initial  entropy  of  the  y sequence,  and  the 
conditional  increase  in  entropy  for  each  additional  y generated, 
the  total  entropy  of  the  Y vector  is 


HK<-)  = k=^+1  Hl(yk+l/yk’ * * ’ ,yk-M)+HM(yl’ ' * * ,yM^  * 
Define  the  entropy  of  an  M**1  order  markoff  source  as 

H*(yk+l/y>  - Hi(yk+l/yk’“**yk-M)  * 


(2.5.7) 


(2.5.8) 


40 


Then  for  the  case  when  the  probability  distribution  of  is 
stationary : 


H*(yK+l/-)+HM(yl‘,,,,yM)  = KH(C)  * (2.5.9) 

k=M+l 

If  the  differencing  process  is  time  invariant , i.e.,  if  b^^=b^, 
and  a^j=a^  for  all  i,  j,  k>M,  then 

H^Y)  = (K-M)  H*(yK1/Y)  + H^y^ . . . ,yM>  . (2.5.10) 


When  the  driving  function  £ is  Gaussian  the  Y vector  is  also 
Gaussian,  so  that  the  total  entropy  of  the  y process  is 

H^(Y)  = | LOG  (2* )K  eAy  (2.5.11) 

where 

Ay  = DET  [E{YYT}]  = DET  [R^] 

Ryy  = E {YYT}  = E (A«TAT>  = AR^  AT 

. 


Then 


K 2 K 

VX>  1 £ 7 “S  ■ £ w 

which  is,  of  course,  identical  to  equation  (2.5.3). 
As  before,  for  the  first  order  markoff  source: 


(2.5.13) 


Hl(yK/yK-l>  = Hi^k^  • (2.5.14) 

When  the  distribution  is  stationary  and  the  differencing  process 
is  time  invariant,  the  conditional  entropy  for  very  large  K is 
found  frcm 

E {yKyK-1}  : Oy  a (2.5.15) 

and 


Him 

K-**> 


E{yKyK-l}  = 


so  that 


= J UOG  2ne  o2 


| LOG  2 ire  a2  (1-a2)  (2.5.16) 


2.6  The  Channel  Capacity  of  a Sensor 

All  real  world  systems  must  make  use  of  one  or  more  measuring 
devices  in  order  to  be  able  to  have  an  absolute  (or  relative) 
assessment  of  performance  so  that  the  tasks  which  are  performed  may 
be  judged.  It  seems  intuitively  true  that  in  any  real  time  control 
problem,  feed-forward  or  feedback,  it  must  be  the  measuring  device 
which  ultimately  limits  the  performance  of  the  control  process.  If 
data  could  be  taken  quickly,  efficiently  and  accurately,  then  100 
percent  precise  ccrmands  could  be  given  and  zero  error  performance 


42 


would  be  achieved.  Of  course  no  sensor  can  provide  data  that  good; 
they  all  provide  data  that  is  distorted,  noisy,  late  and  sometimes 
missing.  Until  this  present  report,  feedback  control  system  theory 
has  lacked  techniques  for  assessing  the  worth  of  sensors.  There 
has  been  no  good  criterion  by  which  sensors  could  be  judged  and  no 
scale  on  which  alternate  sensors  could  be  compared. 

The  significant  new  results  in  this  dissertation  were  obtained 
by  no  longer  using  the  sensor  as  a data  gathering  device  having 
cert aiin  variance  properties,  but  rather  as  an  information  trans- 
mitting device  having  channel  properties.  Such  a view  is  not 
unnatural.  It  is  a highly  respectable  assumption  when  made  by  coding 
theorists  analyzing  ccrmunication  systems  and  moreover  much  can  be 
said  for  the  analogy  between  control  and  ccrmunication. 

Of  course,  the  worth  of  any  proposed  property  of  a system  depends 
on  the  knowledge  it  conveys  to  the  user,  and  given  that  a sensor  has 
a channel  property  it  is  not  surprising  to  learn  (as  will  be  proven 
in  the  later  chapters)  that  this  property  can  be  used  to  develop  a 
measure  of  the  system  performance.  The  channel  capacity  of  the 
sensor  becomes  the  factor  which  limits  the  flow  of  information  around 
a system  and  therefore  the  factor  which  can  measure  the  effectiveness 
of  the  given  sensor  path  in  accomplishing  seme  end. 

The  channel  property  studied  in  this  dissertation  will  be  that 
of  the  mutual  information  between  sensor  input  and  output  and  will 
be  referred  to  as  the  "Sensor  Channel  Transmittance."  This 
quantity  closely  resembles  Shannon's  channel  capacity,  but,  for  the 


43 


i 


situations  examined  here,  the  luxury  of  a controllable  signal 
probability  density  function  is  not  available  and  so  the  two  concepts 
are  different  and  are  treated  as  such. 


2.7  The  Sensor  Channel  Transmittance 

The  sensor  or  measuring  device  that  is  used  to  determine  the 
state  of  a dynamical  system  can  be  regarded  as  a noisy  communication 
channel.  Figure  2.1  is  a possible  representation  for  such  a channel. 


1.  S = B(Y) 

2.  Z = C(W) 

Figure  2.1.  Interpretation  of  a Sensor 
as  a Cofimunicat ion  Channel. 

The  additive  noise  term,  {n^  is  usually  an  Rth  order  markoff 
process  which  is  independent  of  the  measured  Mth  order  markoff  process 
{y^J.  However,  no  markoffian  properties  will  be  assumed  at  this 
point.  The  signal  shaping  operator  B is  linear  and  physically 
realizable,  and  so  it  may  be  represented  as  a casual  matrix  having 
entries  b.. , such  that  b..  = 0,  for  all  i>j.  The  only  requirement 


44 


on  this  matrix  operator  is  that  DET  [B]  be  non-zero.  This  is  an 
obvious  requirement  that  states  mathematically  the  constraint, 
that  it  always  be  possible  to  completely  recover  the  signal  y, 
when  the  measurement  noise  is  known. 

The  output  filter  C can  be  of  a more  general  nature  and,  for 
convenience,  it  taken  as  a nonlinear  casual  operator  of  the  form 


W 

c2<wi»w2) 


Z = C(W)  = 


C3{W1>W2’W3) 


c^Cw^ , . . . ,w^) 


where 


Z = 


and 


W = 


w-, 


w„ 


w„ 


(2.7.1) 


The  "recovery"  constraint  on  C(W)  requires  that  the  Jacobian 
of  the  transformation,  J(W),  must  be  non-zero  for  all  W.  The 
Jacobian  is  given  by  the  expression 

J(W)  $ DET  [r(W)] 


45 


where  r(W)  is  the  matrix  of  partial  differentials  given  by 


3c*  (w-.  i • • • jWw) 

r<“>‘  357  K=Wy). 


In  this  dissertation  the  mutual  information  between  the  K dimen 
sional  input  and  output  vectors  is  called  "The  SefTSor  Channel  Ynarfs- 
mittance."  This  quantity  is  given  as: 


ik(y8z>  = Hk(Z)  - Hk(Z/Y)  (2.7.2) 

or 

ik(Y;Z)  = J pz(Z)  LOG  M-JJ p(Z,Y)  LOG  dZ  dY 

(2.7.3) 

where  the  integrals  are  of  K^h  order  and  are  taken  over  the  whole  k 
dimensional  space . 


The  probability  density  functions  for  S and  W are  related  to 
the  input  and  output  density  functions  through  the  transformations: 


i)  Z = C(W) 

ii)  pz(C(W))  = pw(W)|J(W)|“1  (2.7.4) 

iii)  Py(Y)  = ps(B-1Y)  B. 

The  mutual  information  when  written  in  terms  of  S and  W becomes: 


!K(— > = f P*/W)  1/36  Pw(W)  LOG  | J(W)  | dW 

-/p„,s<«-S>  WG  -*t  d£ 

-Jj pw  s(W,S)  LOG  |J(W)|  dW  dS 


46 


• <•  <*  • 


or 


VI»Z>  = ~ H^W/S). 


(2.7.5) 


The  conditional  entropy  of  W given  S is  simply  H^(N).  This 
may  be  shown  as  follows: 


• • 


W-  * / « / * Pw>s«.S)  UB  ^ 

-•00  _qo  “ “ 


(2.7.6) 


where 


W = S+N 


P(W/S)  = pN(W-S) 


(2.7.7) 


pws(-»-  = pns(-~-»§-)  = ps<S)  Pn(W-S)  (2.7.8) 

and  Pn(N)  is  the  probability  density  function  of  the  K dimen- 
sional noise  vector.  Making  the  proper  substitutions  reduces  the 
conditional  entropy  to 
• 00 

"k  ^ / d«  pns(W-S,S)  LOG 

-•  _oo  

00  00 

= /ps«)as  / ■%  p„CN>  LOG  ^ 

= • (2.7.9) 

Using  this  result  in  the  expression  for  the  mutual  information  yields: 

= V2*>  - ^(H)  . (2.7.10) 


47 


Now  using  property  2 (Table  I)  of  entropy,  it  is  true  that 


H^COJ))  = H^N)  ♦ J dN  pn(N)  LOG  | J(N)  | 

— OO 

In  addition, 

CO 

H^ttS+N))  = H^S+N)  + J dW  PJ((W)  LOG  |J(W)| 

—00 

oo  00 

= Hk(S+N)  + J dS  J dW  Pws(W,S)  LOG  |J(W)| 

OO  oo 

= H^S+N)  ♦ J dS  ps(S)  J dW  pn(W-S)  LOG  |j(W)| 

«oo  — oo 

00 

= Hk(S+N)  * J dN  pn(N)  LOG  | J(N) | . 

— oo 

Then  from  these  two  equations  and  2.7.10,  it  follows  that 


^((XS+N))  - HgCCCN))  = Hk(S+N)  - H^N) 

HgCtfS+N))  - H^(C(N) ) = Ik(Y;Z)  . (2.7.11) 

This  form  for  the  Channel  Transmittance  is  more  important  than  2.7. 
because  it  relates  directly  to  readily  accessable  quantities,  i.e., 
the  sensor  output  for  no  signal  input  (C(N))  and  the  sensor  output 
for  a signal  input  (C(S+N)).  Thus , according  to  equation  (2.7.11), 
Ik(Y;Z)  can  always  be  found  for  any  sensor,  even  by  simulation 
if  necessary.  So  under  laboratory  conditions  I..(Y;Z)  may  be 
determined  experimentally,  without  any  knowledge  of  B or  C being 
required  for  this  determination.  This,  coupled  with  the  fact  that 
all  the  bounding  theorems  on  system  performance  subsequently  proved 


10 


in  the  body  of  this  dissertation,  always  depend  directly  in 
I^CYiZ)  [the  term  H^(N))  never  appears  without  the  corresponding 
term  H^(S+N ) ] , means  that  it  is  never  really  necessary  to  know 
individually  any  of  the  sensor  parameters  [N,  HK(N),  H^tS+N) , etc.] 
but  only  I^( Y jZ) . Thus  sensors  which  defy  representation  in  con- 
ventional terms  and  which  do  not  have  suitable  models  can  be  ccmpletely 
described  for  the  purposes  of  this  discussion  simply  by  a cliannel 
transmittance. 

Recognizing  that  C(S+N)  is  the  sensor  output  (or  the  signal 
measurements)  it  will  not  be  surprising  to  learn  below  tliat  the 
difference  between  the  entropy  of  the  output  for  signal  and  no- 
signal conditions 

The  information  in 

H^(C(S+N))  - H^(C(N))  = the  output  about  (2.7.1?) 

the  input 

can  be  interpreted  as  that  amount  of  infornwtion  in  the  output  that 
is  actually  effective  for  reducing  the  entropy  uncertainty  of  estimates 
of  the  input.  Obviously  if  any  estimates  of  a function  of  Y are  to 
be  made  using  the  noisy  measurement  Z,  it  is  the  Channel  Transmittance 
Ik(Y;Z)  that  will  measure  how  successful  the  estimation  procedure  is. 
The  reader  should  be  cautioned  that  this  is  not  a completely  new 
interpretation  of  IK(Y;Z).  If  the  sensor  of  Figure  ?.l  was  called  a 
channel  and  if  Y took  on  only  a finite  number  of  valuesand  was  called 
a code  then  there  would  be  no  objection  to  calling  1^ (Y;Z)  the  effect- 
iveness of  the  decoding  procedure.  What  is  new  is  that  this  approach 
is  now  being  applied  to  the  estimation  of  continuous-state  processes. 


X 


49 


Since  mutual  information  is  always  non-negative,  equation 
2.7.10  yields  the  important  inequality 

Hj^S+N)  > H^N) 


(2.7.13) 


In  a similar  manner,  the  mutual  information  Iw(N;Z)  yields  an 

• • • # • z,  • • «4  « . m • % f . 

equally  important  inequality 

!k(N;Z)  = H^S+N)  - H^S)  >_  0 


(2.7.14) 


or 


hr(S+N)  >_  H^S) . 


(2.7.15) 


2 . 8 Entropy  of  the  Sum  Vector  (S+N ) 

For  many  of  the  derivations  to  follow,  it  is  necessary  to  know 
the  conditional  entropy  of  the  sum  of  the  two  independently  distributed 
random  vectors  S and  N,  given  all  the  past  values  of  the  sum.  Since 
the  probability  densities  of  S and  N are  usually  known  in  advance  it 
is  not  difficult  to  conceive  of  using: 


PC  ')  = f da  P (a)  P (S+N-a) 
- — I — s — n 


(2.8.1) 


in  order  to  determine  the  desired  entropy  function. 

However  it  is  sometimes  desirable  to  be  familiar  with  tlie  proper- 
ties of  the  conditional  entropy  without  being  obliged  to  cany  out  the 
convolution  and  entropy  integrals  directly.  An  expression  similar  to 
equation  2.5.6  is  clearly  desirable,  but  unfortunately,  the  sum  of  two 
markoff  processes  is  no  longer  Markovian  and  no  such  simple  equation 


exists.  However  when  the  randan  variables  involved  are  stationary 
and  N is  narkoff  then  it  is  possible  to  determine  a useful  assymptotic 
description  of  the  conditional  entropy.  It  follows  fran  the  use  of 
equation  (2.3.5)  that  the  mutual  information  increases  monotonically 
with  the  nunber  of  samples  used,  i.e., 

IK(S;S+N  j "*="  its*  ,s2  j.  .7 ,sK f(s+n)  * ,'Cs+rJ2  . VCs+h)^) 

^ I ( s^ , S2 , . . . , s^ , , (s^n)^ , (s+n) ^ > • • • > ( s+n ) ^ ) 

^ Ks^jSj* . • • j (s+n ) ^ , * « . , (s+n)^, (s+n ) ) 


1 Ik+1CS;S+N)  . 


(2.8.2) 


Now  expanding  each  of  the  mutual  information  terms  according  to  a 
modif ication  of  equation  (2.7.10), 


IK(— +— } = HK(-+-)  ~ HK(- 
which  leads  to 


H^S+N)  - H^N)  <_  Hk+1(S+N)  - Hk+1(N) 


(2.8.3) 


It  has  already  been  determined  (see  Section  2.5,  equation  (2.5.3)) 
that  the  entropy  of  a K dimensional  markoff  noise  vector  is: 

K 


«K«>  = £ V^). 

This  simplifies  the  above  inequality  to  yield: 


(2.8.4) 


HK(S+N)  H|(s+n)^+i/(s+n)K(s+n)K_i»‘ • • »^s+n)iJ  ♦Hj^S+N)  - H^(£^+^) , 


or  finally 

H j(s+n)K+1/(S+N)J>  Hj_(Ck+1). 

In  the  stationary  case  this  result,  together  with  equation  (2.2.19) 


leads  to 


| (s+n)K+2/Cs+n)K+1,...,(s+n)2| 

= ^ |(s+n)K+1/(s+n)K,...,(s+n)1J 


^ ^ I (s+n)K+2/(s+n)K+1>. . . jCs+nJ^j^  ^H-^U).  (2.8.5) 


is  monotonically  decreasing 


Thus  the  sequence  H | (s+n)^+^)/C£+N)^ 
and  is  bounded  below  by  H^(£).  It  is  therefore  convergent  to  a 
limit 

lim  H [(s+n)K+1/(S+N)  J = H j (s„+n„)/(S+N)|  >_  H^U)  (2.8.6) 

where  in  the  limit  S+N  become s an  infinite  dimensional  vector.  An 
alternate  and  useful  expression  for  the  convergent  property  is: 


Given  any  e>0  there  exists  a tc  sufficiently  large  so  that  for 
all  K><  , 

/(S+N)  J - H 

This  relationship  will  prove  useful  below  for  studying  the  incremental 
change  in  Channel  Transmittance. 


il  (s+n) 


K+l 


(s  +n  ) / ( S+N ) < e.  (2.8.7) 

OO  00  — — — 


2.9  Incremental  Channel  Transmittance 

According  to  equation  (2.7.10)  the  K dimensional  Sensor  Channel 
Transmittance  is 


52 


IK(Y;Z)  s HK(i+N)  - V-  • 


(2.9.1) 


As  more  data  is  taken  and  K increases,  the  Channel  Transmittance 
(according  to  Theorem  2.3.1)  must  also  increase  and  in  general,  it 
increases  without  bound.  An  interesting  quantity  to  study  is  A^, 
the  Incremental  Channel  Transmittance,  defined  as 

AK  " Vl  " Ik(I’1)  * (2.9.2) 

It  follows  from  equation  (2.8.2)  that  is  always  positive  and 
it  represents  the  amount  of  "new"  information  obtained  about  all  the 
signals  (including  the  latest  one)  that  may  be  derived  from  an 
additional  measurement.  After  a sufficient  length  of  time,  the 
new  data  does  not  provide  any  new  information  about  the  oldest  signal 
samples  and  therefore  approaches  a steady-state  limit  Am.  In  the 
stationary  case  this  statement  is  proven  as  follows. 

From  equations  (2.9.1)  and  (2.9.2), 

Ak  = Hk+1(S+N)  - Hk+1(N)  - CH^S+N)  - Hk(N)], 
using  equation  (2.8.4)  and 

HK+i(i+^)  = H^S+N)  + H1j(s+n)K+1/(S+N)K]  , 
the  equation  for  A^  becomes 

Ak  = Hj.  ((s+n)K+1/(S+N)K)  - HjUk+1).  (2.9.3) 

Then  by  applying  equation  (2.8.7)  the  convergent  property  of  ak 
may  be  stated  as: 


S3 


Given  any  e>0  there  exists  a k sufficiently  large  so  that  for 
all  K>k 


aK  ' - e • (2.9.4) 

The  asymptotically  stationary  Incremental  Transmittance,  A^, 
will  prove  useful  In  Sectiorf  3.-d  for-  showing'*thcft  in  Stafiortar^  ' " 

markoff  estimation  problems  the  average  entropy  of  the  error  approaches 
a steady-state  value. 


54 


CHAPTER  THREE 
THE  ESTIMATION  PROBLEM 


3.1  Surnary 

Except  for  the  requirements  of  real  time  data  processing,  feed- 
back, control  syctens  are  very  3rnilar"to  estimating  systems.  ‘ There- 
fore it  is  not  illogical  to  assume  that  the  investigation  of  feed- 
back should  begin  with  an  investigation  of  estimation,  especially 
since  this  type  of  problem  has  received  much  attention  and  several 
mean  square  error  solutions  are  known  to  exist  [34,  35]. 

Useful  entropy  analysis  is  not  just  a matter  of  writing  the 
system  equations,  transforming  the  corresponding  probability  density 
functions  and  then  determining  the  various  signal  entropies.  The 
scalar  feedback  problem  studied  in  3.2  best  demonstrates  the  diffi- 
culties involved  in  this  approach.  The  major  result,  that  of 
bounding  the  error  entropy  from  below  is 

H(x)  > H^y). 

If  "a"  were  known,  then  this  bound  could  be  determined;  but  then  if 
"a"  were  known  H(x)  could  be  determined  exactly.  Thus  in  this  example 
no  useful  purpose  has  been  served  by  using  the  entropy  measure.  This 
setback  is  still  further  incentive  for  first  developing  entropy 
techniques  by  solving  the  simpler  problem  of  estimating  before  attempt- 
ing to  solve  the  very  complex  feedback  control  problem. 

The  critical  step  for  initiating  entropy  analysis,  the  study  of 
the  mutual  information  between  the  system  error  and  the  sensor  output. 


r ^ 

is  investigated  in  3.4  (after  a suitable  problem  definition  is  given 
in  3.3).  The  most  useful  result  of  the  mutual  information  approach 
is  that  it  leads  to  a bound  on  the  estimation  error  entropy  that  is 
independent  of  the  estimating  filter  and  is  a function  solely  of 
the  known  properties  of  the  system  input  and  the  system  sensor. 

In  3.5  a corollary  is  proven  that  is  a reinterpretation  of  the 
estimation  theorem  through  the  use  of  the  concept  of  a Sensor  Channel 
Transmittance.  This  corollary  completely  relaxes  all  of  the  con- 
straints on  the  form  that  the  estimating  system  may  take  on.  The 
filters  "C"  and  "D"  need  not  preserve  information  and  the  only 
description  of  the  sensor  that  is  now  required  for  an  entropy  analysis 
is  its  Channel  Transmittance,  I(Y;Z). 

A simple  example,  presented  in  3.6,  is  used  to  show  that  in  the 
Gaussian-linear-estimation  problem,  entropy  solutions  lead  to  con- 
ventional results.  With  this  experience  as  background,  another 
corollary  to  the  estimation  theorem  is  proven,  the  results  of  which 
demonstrate  clearly  the  basic  similarities  between  variance  and 
entropy  as  uncertainty  measures.  For  linear  Gaussian  problems 
variance  and  entropy  are  completely  interchangeable.  The  remainder 
of  the  chapter  is  devoted  to  understanding  the  implications  of  the 
results  as  they  apply  to  the  change  in  error  entropy  with  additional 
measurements . 

3.2  Naive  Application  of  Entropy  Analysis 

The  tracking  problem  produces  an  interesting  example  of  how 
undisciplined  use  of  the  entropy  criterion  can  lead  to  difficulties. 


56 


Consider  the  scalar  feedback  estimation  problem  shown  in  Figure  3.1. 
The  following  theorem  applies: 

Theorem  3.2 


The  entropy  of  x is  bounded  by: 

H(x)  _>  H nj  3.2.1 

i.e.,  the  entropy  of  x always  exceeds  the  entropy  of  the  component  of 
the  noise  that  appears  at  the  output. 

Proof : 

Since 

x = T?iy  ' lfen 

y = y, 

it  follows  that  the  tnans format ion  of  the  probability  density  functions 
is 

Pxy(x,y)=  pny<n.y>  (p;  ) • 

So  that  using  property  2 (Table  I),  the  joint  entropy  of  x and  y may 
be  found  in  terms  of  the  entropy  of  y and  n,  i.e. , 

H(x,y)  = H(n,y)  + LOG  ^ 

= H(y)  + H(n)  + LOG  ^ 

■ H(y>  * « Jjfj  n)  . 

Now  examine  the  mutual  information  between  x and  y: 

I(x;y)  = H(x)  ♦ H(y)  - H(x,y) 

= H(x)  ♦ H(y)  - H(y)  - H n]  , 


57 


or 


H(x)  = I(x;y)  ♦ H nj 


Since  I(x;y)  is  always  positive,  the  theorem  is  proven. 

In  an  analogous  manner,  using  I(x;n),  it  is  possible  to  prove 
that  the  error  entropy  always  exceeds  the  entropy  of  the  closed  loop 
component  of  the  signal  at  the  output,  i.e., 

H(x)  (ife  y)  • 

It  would  appear  that  these  bounds  on  error  entropy  would  have 
strong  applications  for  system  analysis.  Unfortunately  they  do  not. 

Hie  reason  is  that  even  if  the  minimum  possible  error  entropy  occurs 
with  satisfaction  of  the  equality  in  equation  3.2.1  (and  it  definitely 
does  not),  evaluation  of  the  quantity  H nj  still  depends  on  deter- 
mining the  values  of  "a"  to  achieve  the  minimum,  and  if  "a"  were  avail- 
able, entropy  analysis  would  liave  no  advantage  over  the  actual 
calculation  of  the  properties  of  the  true  error. 

In  actuality  «(&")  j^or  H | yi-j-  yj  ^ Fiave  little  bearing  on 
the  true  minimum  error  entropy  that  may  be  achieved  by  this  system. 

A typical  relationship  between  "a"  and  eiror  entropy  is  shown  in  F'iguiv 
3.2.  Thus , there  is  no  simple  way  to  find  either  a or  H(x)  from  this 
particular  application  of  entropy  for  feedback  analysis,  even  though 
two  apparently  very  nice  ruuslts  wei'e  obtained  in  the  process. 

3.3  The  Lstimition  Problem,  Description 

As  the  first  useful  step  in  the  study  of  the  use  of  information 
theory,  it  is  instructive  to  investigate  the  elementary  estimation 


problem  shown  in  Figure  3.3.  A randan  signal  sequence  taking  on  the 
values  y^,  k=l,2,...,K  is  processed  by  the  known  dynamical  system  D. 

D may  be  nonlinear  and  time  varying  and  it  may  also  be  non-real izable 
in  the  sense  that  the  system  output  could  depend  on  future  values 
of  the  input.  A suitable  vector  representation  for  this  operation  can 
be  written  as: 

U = D(Y) , 

where  the  U and  Y vectors  are  K dimensional  vectors  whose  components 
are  the  members  of  the  corresponding  random  sequence,  i.e., 


D(Y)  = 


dl(yl’y2 * * * * *yK^ 
d2^yl*y2 ’ * * * »yK^ 


dl(y) 

d2(Y) 


dK(yl’y2 * * ’ * ,yK^ 


dK(*> 


if  djc(y^,y2*.  • • jy^,. . . ,yK)  is  independent  of  all  y^  for  i >_  k+1, 
then  the  system  is  called  casual. 


60 


1.  U = DCY) 

2.  Z = C(B(Y) 

3.  V = F(Z) 

4.  X = U - V 


Figure  3.3.  The 


Problem 


At  the  same  time,  the  sequence  {y^}  is  measured  by  the  noisy 
sensor  consisting  of  a nonlinear  pre-filter,  B,  the  corrupting 
additive  noise  N and  the  nonlinear  post-filter  C.  At  this  point  in 
the  development,  no  other  constraints  are  placed  on  the  forms  of  B, 

C,  or  N,  other  than  the  fact  that  the  noise  process  {n^}  is  independent 
of  the  signal  process  {y^} , and  that  B,  C,  and  D be  information 
preserving  in  the  sense  of  section  2.4.  The  measurement  vector  Z, 
is  then  processed  by  the  estimation  filter  F,  to  produce  the  random 
vector  V.  The  optimum  filter  F(Z)  = F(Z)  is  chosen  so  that  V is 
the  best  estimate  of  the  processed  signal,  U,  in  the  sense  of  minimum 
error  entropy.  Again,  at  this  point  F is  not  constrained  to  have 
any  particular  form  other  than  that  of  information  preservation. 

Suppose  that  this  estimating  system  is  optimized  (or  will  be 
optimized),  a natural  question  is-,  "What  will  the  system  performance 
be?"  One  of  the  severest  handicaps  of  the  currently  available 
analysis  techniques  is  that  the  answer  to  this  question  requires 
determining  the  optimizing  filter  and  this  filter  is  not  easily  found. 
If  the  problem  is  one  of  preliminary  design  analysis  of  complicated 
systems,  then  not  being  able  to  easily  calculate  the  optimum  per- 
formance adds  greatly  to  the  design  problem  and  obscures  the  impor- 
tance of  certain  system  parameters.  An  advantage  of  the  information 
theory  approach  developed  herein  is  that  it  provides  the  "back  door" 
to  system  performance  evaluation  without  solving  the  filter  problem 
directly.  When  the  estimation  problem  has  the  form  shown  in  Figure 
3.3,  the  following  theorem  is  applicable. 


62 


3.4  Ths  Entropy  Theorem  for  Estimation 
Thaoram  3.4: 

1.  For  the  general  estinration  problem  sha.n  in  Figure  3.3, 
and  an  arbitrary  filter  function,  F(Z) , if  all  the  transfomations 
are  information  preserving,  then  the  entropy  of  the  error  vector  X 
always  satisfies  the  inequality: 

H(X)  ^H(D(Y))  + H(N)  - H(B(Y)  ♦ N)  = Hq  , (3.4.1) 

where  Hq  is  independent  of  F(Z).  In  other  words,  the  reduction  in 
the  processed  signal  entropy,  H(D(Y))  - H(X) , due  to  feed-forward 
estimation,  cannot  exceed  the  Sensor  Channel  Transmittance , 1 (Y , 
where 

I(Y;W)  = I(Y;Z)  = H(B(Y)  ♦ N)  - H(N) . 

2.  Minimising  the  mutual  information  I(X;W)  is  equivalent  u 
minimizing  the  error  vector  entropy. 

3.  The  minimum  error  entropy  occurs  when  i(X;W)  = 0.  This 
minimum  entropy  is 

H(X)  1 = H = H(D(Y) ) ♦ HUO  - H(B(Y)  ♦ M)  (3.4.2) 

J min  ° 

and  is  attainable  if  the  optimum  filter  i'  can  be  chosen  so  that  the 
random  vectors  X and  W are  independent . 

The  proof  of  this  theorem  is  based  on  considering  the  mutual 
information  between  the  measurement  vector,  W,  and  the  error  vector 
X.  It  is  intuitively  obvious  that  in  the  sense  of  entropy,  if  the 
system  is  optimized  [H(X)  is  minimum]  then  V can  not  contain  any 


63 


irreducible  information  about  the  error  X,  i.e.,  I(X;V)  must  be  a 
minimum.  Since  I(X;V)  = I(X;W)  for  information  preservation,  it 
follows  that  this  implies  that  the  mutual  information  between  the 
error  and  the  measurements  is  a minimum.  If  it  is  not  a minimum 
then  Z could  be  reprocessed  and  this  additional  information  removed, 
thus  reducing  the  entropy  of  X further.  This  intuitive  notion  that 
the  minimum  error  entropy  corresponds  to  minimum  "error-measurements" 
mutual  information  will  be  proven. 

Proof: 

The  mutual  information  between  X and  W is  defined  as : 

/p  (X,W) 

dX  p^CX.W)  LOG  j^xTp^  . <3.4.3) 

where  pxw(X,W)  is  a function  of  2K  arguments  and  each  ii  gral  is 
K dimensional,  i.e.,  dX  - dx^  dx^  ...  dx^.  According  to  Table  I, 
Property  15,  this  mutual  information  may  be  written  in  terms  of  the 
individual  entropies  of  X and  W as 

I(X;W)  = H(X)  - H(X,W)  + H(W)  . (3.4.4) 

The  vector  W is  given  by 
W = N + B(Y) , 
so  it  follows  that 

H(W)  = H(N  + B(Y) ) . 

If  the  distributions  of  N and  Y are  known,  then  at  least  in  theory 
H(W)  may  always  be  calculated. 


The  joint  entropy  between  X and  W is  found  frcm  examination  of 
the  system  equations 

W = N + B(Y)  (3.4.5a) 

X = -F(C(N+B(Y) ) ) + D(Y)  = -F  (N+B(Y))  + D(Y)  (3.4.5b) 

The  Jacobian  J,  for  this  transformation  of  ^ 77  j into  | ^ | is  given 
by  the  expression 


Kl  “IS 


3 N +3  B(Y) 

J $ J(N;Y)  = DET  \-  -n-  - - - ^ 

-3  F 3 D-3  T , 

net  y y c J 


= DET 


[1  ' +3  B(Y)  "I 

' . -Y_  T . 

-3  F ' 3 D-3  F 
net  y y c J 


= DET  [3  D - 3 F + (3  F )(3  B ( Y ) ) ] (3.4.6) 

y ye  nc  y - 

where  the  entries  in  the  partitioned  matrix  are  defined  in  the  follow- 
ing manner: 


3^N  = I (the  identity  matrix) 


3 B(Y)  = B'  (with  Bf 


y “ 


3 F- 
.n  c.  . 


il 


3F  (N+B(Y)) 
C1 

3nx 


3F  (N+B(Y)) 
CK  ~ 

3ni 


3bi(Y) 

~^y~ 


3F  (N+B(Y)) 
C1  ~ ~ 
3nK 


3F  (N+B(Y) 
CK 


,nK 


65 


I 


1 F • 

y c 


3F  (N+B(Y) ) 
C1  “ ~ 


3Fc  (N+B(Y)) 

T 


C1  - - 


3F  (N+B(Y)) 
ck  “ 

8yi 


3Fc  (N+BCY)) 

T 


CK"  " 


rK 


: 3 F B' 
n c 


3.  D $ 


Using  this  notation, 

J(N,Y)  = DET  [3  F ] DET  [B']  + DET  [3  D]  - DET  [3  F ], 
— — n c y y c 

but  it  is  obvious  that 


DET[3  F ] = DET  [(3  F ) B']  = DET  [3  F ] DET  B'], 
y c n c n c 


so  that  finally, 

J(N,Y)  = DET  [3yD(Y)] 


(3.4.7) 


The  Jacobian  given  in  this  manner  may  then  be  used  to  express  the 
probability  relationship  existing  between  the  vectors  (N,Y)  and  (W,X), 


"8dl 

3d, 

3d, 

. 

\ 

> • • • » 5 

% 

1 

• 

• 

• 

• 

• 

• 

• 

• 

• 

3dK 

3dK 

3dK 

3 ’ 

3 

»••*»  3 

*1 

yz 

yK  _ 

j 

- 

66 


The  pertinent  probability  density  function  equations  are: 


p (X,W)  dX  dW  * p (N,Y)  dY  dN  = p (Y)  p (N)  dY  dN  (3.4.8) 

aw  *•  ™ ™ i iy  **  y * 1 1 ■“  * “ 

and 

P^X.W)  = pyCY(X,W))  Pn(N(X,W))  iJCN.YJf1  . (3.4.9) 

The  joint  entropy  of  (X,W)  is  defined  as 


CD  CD 

HOC.W)  S / U dW  p^CX.W)  LOG  . 

-00  -oo  XW  — 

When  3.4.8  and  3.4.9  are  substituted  into  this  expression  it 
becomes 

00  00 

1 


H(X,W)  = II  dN  dY  pn(N)  py(Y)  LOG 


Pn(N>  Py(I}  |DETC3yD3 
(3.4.10) 


If  the  log  tern  is  expanded  and  the  indicated  integrations 
performed,  the  mutual  entropy  can  then  be  written  as: 

H(X,W)  = H(Y)  ♦ Ey  {LOG  |DET  UyD]|}  + H(N)  (3.4.11) 

where  Ey  { } is  the  expectation  operation  taken  with  respect  to  the 
random  variable  y,  i.e., 

Ey  <«>  -/  dX“Py^>  • (3.4.12) 

An  important  simplification  of  this  equation  is  possible,  if  it  is 
noted  that 


H(D(Y)>  = H(Y)  + Ey  {LOG  DUT  [3yD])  . 


(3.4.13) 


With  the  aid  of  equation  (3.4.13)  the  joint  entropy  of  X and  W 
can  be  expressed  in  terms  of  the  entropies  of  Y and  N, 


H(X,W)  = H(D(Y))  + H(N) . (3.4.14) 

The  expressions  for  H(W)  and  H(X,W)  may  new  be  combined  in  equation 
3.4.4  to  yield  the  equation  from  which  all  the  results  of  this 
theorem  follow: 

I(X;W)  = H(X)  - H(D(Y) ) - H(N)  + H(B(Y)+N)  . (3. 4. IS) 

Only  I(X;W)  and  H(X)  are  functions  of  the  estimation  filter  F(Z) , 
therefore,  minimun  error  entropy  occurs  for  minimum  mutual  infor- 
mation. Since  mutual  information  is  always  a non-negative  quantity 
it  must  be  true  that 

H(X)  > H(D(Y))  + H(N)  - H(B(Y)+N)  . (3.4.16) 

The  importance  of  this  equation  is  that  it  relates  the  entropy  of 
X to  the  known  entropies  of  the  input  quantities,  without  requiring 
explicit  knowledge  of  the  estimating  filter  F(Z).  In  addition  the 
relationship  is  true  for  ail  possible  filters.  Let 

H(D(Y))  + H(N)  - H(B(Y)+N)  = Hq.  (3.4.17) 

Hq  is  a constant  for  a given  sensor  and  a given  dynamical  system  D 
and  is  not  a function  of  F(Z).  Then  from  equation  3.4.16,  it  follows 
that: 

H(X)  > H . (3.4.18) 

— — o 

H(X)  is  bounded  below  and  can  never  be  less  than  H . 

— o 


1 

If  the  equality  in  equation  3.4.18  can  be  achieved  for  sane 
choice  of  F(Z)  then  that  filter  must  be  the  optimum  entropy  filter 
because  is  causes  H(X)  to  take  on  its  minimum  value.  The  equality 
of  equation  3.4.16  can  be  attained  only  if  I(X;W)  = 0,  but  this 
occurs  if  and  only  if  X and  W are  independent.  Therefore  the  optimum 
estimating  filter,  F(Z)  is  such  that 

Px„(X,W)  = px(X)  pw(W)  (3.4.19) 

with  the  result  that 

MIN  {H(X) } $ H(X)  = H . (3.4.20) 

F(Z)  “ - o 

It  is  impossible  to  over  emphasize  the  fact  that  in  either  case, 
that  of  determining  the  entropy  lower  bound  for  estimation  with  sub- 
optimal  filters,  or  of  determining  the  minimum  entropy  for  optimum 
estimation,  it  is  not  required  to  actually  know  either  F(Z),  or 
F(Z) . 

The  requirement  that  X be  independent  of  W requires  that  all 
the  data  be  made  available  before  an  estimation  is  made.  This  is 
analogous  to  a similar  requirement  of  the  Shannon  coding  theorem, 
and  it  follows  that  if  minimum  errors  are  to  be  achieved  for  either 


optimum  estimation  or  optimum  coding  it  is  required  that  the  entire 


69 


Transmittance  of  that  sensor,  it  is  easy  to  conjecture  that  only  the 
informational  properties  of  the  sensor  are  important  for  entropy 
analysis  and  the  particular  form  taken  by  the  sensor's  model  is  not 
significant.  This  relaxation  of  the  constraints  of  Theorem  3.4  is 
stated  rigorously  in  the  following  theorem. 

Theorem  3.5  (A  stronger  version  of  Theorem  3.4) 

If  a signal  vector  Y,  is  measured  by  a sensor  having  a Channel 
Transmittance , I(Y;Z)  then  the  entropy  of  the  error  in  estimating 
D(Y)  is  H(X),  and  HCX)  is  bounded  as 

H(X>  _>  - I(Y;Z)  + I(X;Z)  + H(D(Y)).  (3.5.1) 

Both  F(Z)  and  D(Y)  are  completely  arbitrary  single  value  functions, 
and  the  equality  holds  when  D(Y)  preserves  information.  When  F(Z) 
is  chosen  so  that 

I(X;Z)  = 0 (3.5.2) 

this  corresponds  to  the  minimun  error  entropy, 

MIN  (H(X)}  = H(X)  = H(D(Y))  - I(Y;Z).  (3.5.3) 

F(Z) 

Thus  the  maximan  system  performance  improvement  H(D(Y) ) - H(X)  is 
equal  to  the  Sensor  Channel  Transmittance.  Note  that  this  theorem 
relaxes  the  constraints  on  the  estimation  problem  to  such  an  extent 
that  now  all  the  components  may  have  arbitrary  form. 

Proof: 

Begin  with  the  system  equations  defined  by  Figure  3.4. 


It  then  follows  that  with  U=D(Y) , the  joint  probability  density 


function  of  X and  Z is 


PV7  (X.Z)  = p (X+F(Z),Z) 


This  equation  is  true  for  any  single  valued  function  F(Z) , information 


The  entropy  of  [X,Z]  must  be 


H(X,Z)  = H(U,Z)  = H(D(Y) ,Z) 


This  expression  is  then  introduced  into  the  equation  for  the  informa- 
tion between  X and  Z in  the  following  manner. 


H(X)  - H(D(Y) ) + [H(D(Y) ) + H(Z)  - H(D(Y),Z)] 


HCX)  - H(DCY) ) + I(D(Y);Z) 


I(Y;Z)  > I(D(Y);Z) 


with  the  equality  holding  when  the  transformation  D(Y)  preserves 


H(X)  > H(D(Y))  + I(X;Z)  - I(Y;Z) 


An  example  supporting  the  major  conjectures  of  this  theorem 
is  had  by  considering  the  simple  (and  classical)  problem  of  estimation 


of  a Gaussian  signal  in  additive  Gaussian  noise.  Then: 

D = I 

B = I 

C = I 

Z = Y + N 

and  F(Z)  will  obviously  be  a linear  function  of  Z. 

The  optimum  mean  square  filter  must  satisfy  the  requirement  that 
the  minimum  error  be  orthogonal  to  be  signal  Y + N [42,  p.218].  This 
leads  to  the  equations 

E {(Y  - F Z)  ZT>  = 0 
(F)  E {ZZT}  = E {YZT}  . 

For  convenience  all  variables  are  taken  with  zero  means.  Using 
the  notation 

Ry  - E {YYT> 

- E (NNT) 

the  optimum  filter  is  found  to  be  the  matrix 

F = R [R  +R  ]-1  . 
y y n 

Thr  winiroun  mean  stjutire  error  nwtrix  is  defined  to  be 


E {XXT}  - o2  = E (Y-FZ)(Y-h)T  = - FRy 

* *S,  - *V  “yV1  \ 

■ yi  - cyv1  V 

Simplification  of  this  last  expression  leads  to 

°2  ■ *v  ty  V1  <yvy 

»2  * \ CRy*Rnrl  *n 

A 

Since  X,  the  optimum  error  vector  is  Gaussian,  the  error  entropy  may 
be  written  directly  as 

H(X)  = j LOG  {(2h)K  DET  Co2]} 

2 

or,  making  use  of  o found  above,  H(X)  can  be  expanded  to  yield 
H(X)  = y LOG  {(2n)K  DET  [Ry]  ) 

+ 7 LOG  {(2n)K  DET  tRy+^r1} 

+ y LOG  {(2")K  DET  [R^}  . 

Recognizing  each  of  the  terms  on  the  right  side  of  this  equation 
leads  to 

H(X)  = H(Y)  + H(N)  - H( Y+N) , 

which  is,  of  course,  the  major  conclusion  of  the  theorem. 

Because  of  the  unique  relationship  between  Gaussian  variance  and 
Gaussian  entropy  it  is  suspected  that  an  entropy  expression  exists 
which  unites  the  two  points  of  view.  The  following  corollary  to 
Theorem  3.5  provides  this  unification  by  considering  the  signal  con- 
ditioned by  the  measurements  as  the  fundamental  analysis  quantity. 


74 


3.7  Corollary  I to  the  Estimation  Theorem  3.5 


Corollary  I: 

The  entropy  of  the  error  vector  always  exceeds  the  entropy  of 
the  processed  signal  conditioned  on  the  measurements, 

H(X)  >_  H(D(Y)/Z)  C3.7.1) 

with  equality  if  the  optimum  filter  is  used. 

Proof: 

From  equation  (3.5.7)  the  error  entropy  is 
H(X)  = H(D(Y) ) + I(X,Z)  - I(D(Y),Z) 

using 

I(D(Y),Z)  = H(D(Y))  - H(D(Y)/Z) 

and 

I(X,Z)  >_  0. 

It  follows  that 

H(X)  >_  H(D(Y)/Z) 

Oiere  the  equality  holds  when  F(Z)  is  chosen  so  that 
I(X,Z)  = 0, 

thus  proving  the  corollary. 

This  result  resembles  very  much  the  accepted  relationships  of 
classical  Gaussian  mean  square  analysis.  In  fact,  in  the  special 
case  of  Gaussian  random  variables  in  a linear  system  the  corollary 
reduces  to 


75 


(3.7.2) 


DET  [VAR  {X}]  = DET  [VAR  {D(Y)/Z}] 

an  equation  which  is  well  known  [45,  p.225].  Of  course  this  is  just 
a practical  example  of  the  supposition  first  made  by  DeGroot  [20] 
to  the  effect  that  in  uncertainty  analysis  the  various  measures, 
such  as  variance  and  entropy,  are  interchangeable. 

3.8  Steady-State  Entropy 

When  the  signal  processes  are  stationary  and  the  transformations 
are  time  invariant,  it  must  follow  that  after  a sufficiently  long 
time  steady-state  conditions  will  exist.  Under  such  conditions  it 
would  be  useful  to  know  the  entropy  of  one  coordinate  of  the  error 
vector.  This  knowledge  is  not  directly  available  since  all  the 
previous  entropy  expressions  are  always  written  for  vector  quantities. 
However  the  average  entropy  per  coordinate  ^ 
same  purpose. 

Begin  with  equation  (3.4.1),  using  S = B(Y) 

£ Hk(X)  > £ Hk(D(Y))  + £ [HK(N)  - H^S+N)]  (3.8.1) 

by  assumption 

LIM  J^WY) 

must  exist  and  it  will  have  the  value  H(u),  i.e.,  it  is  the  entropy 
of  the  steaciy-state  processed  signal. 

According  to  the  definition  of  Incremental  Channel  Transmittance 
given  in  Chapter  Two,  it  is  possible  to  write 


H^(X) , will  serve  the 


Hk(N)  - Hj^S+N) 


yi£> 


K 

I 

M+l 


*i 


(3.8.2) 


According  to  the  result  presented  in  equation  (2.9.4),  given  any  e, 
arbitrarily  small,  it  is  always  possible  to  find  k sufficiently  large 
to  ensure  that 


e _>  i > tc 

where  is  defined  as 

c.  - A-  — A • (3.8.3) 

ii°° 

M is  now  fixed  so  that 


M > K . 

Combining  equations  (3.8.1),  (3.8.2),  (3.8.3),  and  taking  the  limit 
yields 

H(x)  im(Y;Z)  m £ e. 

LIM  — Z—  > H(u)  - LIM  -% LIM  A LIM  2_  -4 

K-~°  K K-*»  K K-x»  * K-"»  M+l 

K,(X)  y u , 

LIM  -AP-  > H(u)  - A - LIM  c 

K-h»  K-*® 

or 

~ 1 £ 

which  is  the  requirement  that  the  steady-state  system  performance 
improvement  due  to  feed-forward  estimation  be  limited  by  the 
asymptotically  stationary  Incremental  Channel  Transmittance,  A^. 


H(u)  - LIM 
K 


HK(-) 


K 


77 


CHAPTER  FOUR 


THE  FEEDBACK  CONTROL  PROBLEM 


4.1  Surrmary 

The  most  important  application  of  the  techniques  of  entropy 
analysis  is  in  regard  to  feedback  control  systems.  In  this  chapter 
and  the  next,  two  types  of  closed  loop  control  systems  will  be 
investigated  and  a completely  new  interpretation  of  feedback  control 
will  be  derived. 

This  present  chapter  is  concerned  with  the  performance  of  the 
disturbance  rejecting  system  shown  in  Figure  4.1  which  maintains  the 
minimum  entropy  output,  despite  the  presence  of  both  external  noise  y 
and  sensor  noise  n.  The  major  result  of  this  chapter  is  theorem 
4.4  which  can  be  stated  mathematically  as 

H(DY)  - H(X)  = I(U;Z)  - I(X;Z)  . 

This  equation  leads  to  an  absolute  description  of  the  entropy  of  the 
error  vector  X and  to  a bound  on  the  improvement  of  system  perform- 
ance, due  the  use  of  feedback,  that  is  the  channel  property  of  the 
system  sensor.  Also  from  this  equation  follows  the  corollary: 

H(X)  _>  HCDY/Z) . 

Evaluation  of  the  resulting  equations  with  regard  to  actually 
achieving  the  equality  (or  the  bound)  reveals  that  these  expressions 
are  not  tight  enough  for  real  time  data  processing.  This  indicates 
that  Theorem  4.  4 provides  only  the  initial  ground  work  for  an 
investigation  of  feedback  control  systems.  However,  this  preparatory 


4.  S = B(D(Y)) 


Figure  4.1.  The  Disturbance  Rejectin'*  Feedback,  C ntrol 
System  (Regulator) 


79 


study  is  more  than  sufficient  to  demonstrate  that  it  is  again  the 
Sensor  Channel  Transmittance  that  limits  system  performance.  The 
developments  in  the  future  chapters  are  all  based  on  this  fundamental 
idea. 

4.2  The  Feedback  Control  Problem;  Description 

A recurring  problem  in  the  design  of  the  class  of  automatic 
control  systems,  carmonly  referred  to  as  regulators,  is  to  isolate  a 
dynamical  system  frcm  undesirable  disturbances.  Typically  this 
type  of  problem  is  solved  through  the  use  of  feedback.  A configuration, 
general  enough  to  describe  the  most  often  encountered  systems  of  this 
class  is  shown  in  Figure  4.1.  Here  y is  the  disturbing  input  and 
x is  the  system  output.  The  sensor  consists  of  a linear  prefilter  B, 
a postfilter  C,  and  an  equivalent  noise  source  n,  which  is  used  to 
take  into  account  any  measurement  error.  The  linear  dynamical  system 
of  "plant"  is  given  by  D while  the  feedback  gain  is  denoted  by  F. 

Often  there  are  unchageable  elements  of  the  plant  in  the  feedback 
loop.  Since  F is  completely  general  it  is  convenient  to  assume  that 
such  elements  are  already  included  as  part  of  the  feedback  signal 
processing.  When  the  feedback  gain,  F,  is  chosen  so  as  to  achieve 
an  optimum  (in  some  sense)  system  output  it  will  be  written  as  F. 

When  the  sensor  used  in  the  feedback  loop  is  absolutely  accurate, 
i.e.,  there  are  no  discretizing  errors,  measurement  noise,  distortion, 
etc.,  then  the  optimum  system  solution  is  to  use  a feedback  network 
having  infinite  D.C.  loop  gain  so  that  the  resulting  stable  control 
system  provides  complete  rejection  of  the  disturbances.  On  the 


80 


other  hand,  if  the  sensor  is  very  noisy  when  compared  to  the  effects 
of  the  disturbing  forces,  then  it  would  be  expected  that  a low  feed- 
back gain  would  be  the  optimun  solution. 

Using  entropy  as  a criterion  function,  it  is  obvious  that 
values  of  feedback  gain  exist  which  will  cause  the  system  output 
to  assune,  continuously,  all  the  values  of  entropy  between  the 
entropy  of  the  sensor  noise  and  the  entropy  of  the  uncontrolled 
disturbances.  However,  it  is  not  so  obvious  that  feedback  can 
decrease  the  output  entropy  to  less  than  either  the  signal  entropy 
or  the  disturbance  entropy.  Therefore  it  is  important  to  ask  how 
much  improvement  (if  any)  over  the  uncontrolled  system  entropy  can 
a given  sensor  provide  when  used  in  a feedback  configuration. 

Since  it  is  apparent  that  the  performance  of  the  restating  system  is 
directly  related  to  the  properties  of  the  sensor,  it  is  desirable 
that  the  answer  to  this  question  should  be  based  only  on  the 
invariant  characteristics  of  the  sensor.  Theorem  4.4  below, 
shows  exactly  how  the  Channel  Transmittance  of  the  sensor  is  used 
to  bound  the  performance  improvement  of  a sampled  data  control 
system  using  the  given  sensor  in  a feedback  loop. 

**•3  System  Uniqueness 

The  feedback  system  of  Figure  4.1  may  be  described  by  the 
equation 

W = BDY  ♦ N - BDF(C(W) ) . , 


81 


This  equation  may  be  solved  as  a function  of  BDY+N  to  yield  the 
expression 

W = g1 ( BDY+N ) . (4.3.2) 

If  gx  is  a single  valued  function  of  BDY+N  it  will  be  possible  to 
write  the  informational  inequality 


I(n;W)  < I(n;BDY+N) 


(4.3.3) 


for  any  arbitrary  random  vector  _n . This  is  an  important  step  in 
the  development  of  entropy  analysis  of  feedback  systems  and  so  it 
is  critical  to  understand  the  conditions  under  which  is  a unique 
function  of  BDY+N.  To  simplify  the  discussion  somewhat , the  signal 
is  set  equal  to  zero  and  only  the  noise  response  of  the  system  is 
studied.  The  mathematical  constraint  on  g^N)  is  that  the  Jacobian 

Awl  1 

J (W ,N ) = DET  \ = DET  Cl+BDF^  (W)  ] 

exist  and  be  non- zero  always  with  the  same  sign.  For  a physically 
realizable  system  this  states  that  the  perturbation  response  of 
the  system  is  stable.  This  agrees  with  experience-,  the  noise  response 
of  stable  realizable  systems  is  unique.  The  explanation  of  this 
lies  in  the  fact  that  the  effect  of  a noise  input  can't  appear  instan- 
taneously at  every  point  in  the  loop.  One  or  more  component  outputs 
are  not  changed  because  of  a noise  input.  Under  the  assumption 
that  all  system  elements  perform  single  valued  transformations  of 
the  inputs  it  must  follow  that  any  signal  in  the  loop  is  a single 
valued  functional  of  the  inputs  and  the  initial  conditions,  if  any. 


82 


When  the  feedback  system  is  nonrealizable  it  is  possible  that 
the  noise  response  is  now  unique.  For  example  in  Figure  4.1, 
take 


D = d 

B =1 

C =1 

F(z)  = z2 

Then  for  a noise  input,  n, 

2 

w = n - dw 

or  w may  have  two  values 

..  _ -1  ± '/l+4dn 
w - Id 

but  if  w has  two  value:  so  does  x, 
x = -d(x+n)2 
or 


x2  + x(2N  + g.)  + rf  = o 


This-  clearly  indicates  that  the  mathematical  description  of  the  system 
is  incorrect,  because  what  practical  use  can  a multiple  valued  system 
output  be  when  it  is  impossible  to  tell  which  output  state  will 
result  from  a given  input?  This  type  of  system  must  be  of  limited 
use  and  will  not  be  considered  in  this  dissertation;  only  systems 
for  which  g^(N)  is  unique  will  be  allowed. 


83 


i 


4.4  The  Entropy  Theorem  for  Feedback  Control 


The  entropy  analysis  of  the  regulator  problem,  Figure  4.1,  is 
simplified  by  the  use  of  the  following  lemma. 

4.4.1  Lemma 

For  closed  loop  noise  rejecting  control  systems  of  the  form 
shown  in  4.,yj"0lTu  a^Ditrary'sensor  configuration,  the  joint 

entropy  of  the  errors  and  the  measurements  is  equal  to  the  joint 
entropy  of  the  open  loop  signal  and  the  measurements,  i.e., 

H(X,Z)  = H(U,Z) 

where 

U = D(Y) . 

For  this  lemma  the  plant  D(Y) , is  constrained  to  be  linear. 

Proof : * 

The  variables  X and  IJ  are  related  by 

X = D(Y)  - F(Z) . 

Therefore 

Px>2  «,Z>  - PU'Z(XMZ),-' 
so  that 

H(X,Z)  = H(U,Z) . 

Thus  proving  the  lemma. 

The  results  on  an  entropy  analysis  of  the  regulator  of  Figure  4.1 
is  sumnarized  as  Theorem  4.4.  The  main  result  is  that  the  reduction 


84 


r 


in  the  processed  signal  entropy,  due  to  the  use  of  feedback,  cannot 
exceed  the  open  loop  Sensor  Channel  Transmittance. 

4.4.2  Theorem  4.4 

For  the  regulator  problem  shown  in  Figure  4.1,  where  B and  D 
are  linear,  C is  information  preserving,  and  the  feedback  gain  F(Z) 
is  ar,bi£ttaEy ; 

1.  Minimizing  the  mutual  information  I(X;W)  is  equivalent  to 
minimizing  the  entropy  of  the  error  vector. 

2.  The  entropy  of  the  error  vector,  H(X) , always  satisfies  the 
inequality 


H(X)  » H(U)  - I(U,S+N)  - H 


U = DY, 


(4.4.1) 


S = BDY  . 

3.  Since  I(U;S+N)  is  the  open  loop  Sensor  Channel  Transmittance 
it  also  follows  that 


H(U)  - HCX)  - H(  S+N)  - H(N) . 


(4.4.2) 


The  improvement  in  the  system  performance  because  of  the  use  of  feed- 
back is  bounded  by  an  open  loop  quantity  that  is  independent  of  F(Z). 

Proof : 

The  mutual  information  between  the  errors,  X,  and  the  measurements, 

Z,  is 

I(X,Z)  = H(X)  + H(Z)  - H(X,Z)  . (4.4.3) 


85 


Applying  the  lemna,  and  rewriting  yields 

I(X,Z)  = H(X)  - H(U)  + H(U)  + H(Z)  - H(U,Z) 
= H(X)  - H(U)  + I(U;Z) 


or 


H(U)  - H(X)  = I ( U ; Z)  - I(X;Z). 


(4.4.4) 


This  reuslt  is  not  particularly  useful  since  it  utilizes  I(U;Z), 
which  depends  on  the  sensor  closed  loop  Channel  Transmittance.  The 
reason  that  I(U;Z)  is  not  useful  is  that  to  calculate  it,  or  measure 
it , requires  specifying  F(Z ) , and  the  whole  purpose  of  entropy  analysis 
is  to  avoid  doing  that . 


When  the  sensor  noise  is  additive  and  the  prefilter,  B,  is 
linear  the  closed  loop  Channel  Transmittance  may  be  related  to  the 
open  loop  Channel  Transmittance  as  follows: 


Z = C(BDY+N  - BDF(Z) ) . 

Using 

S = BDY 


this  equation  may  be  solved  to  obtain  Z as  some  function  of  £+N,  i.e. , 
Z = g2(S+N)  = 0(gl(S+N)). 

As  discussed  in  Section  4.3,  g^  must  be  single  valued.  Now,  either 
g2  preserves  information,  or  it  does  not,  but  in  either  case, 

KU;Z)  <_  I(U;S+N) 
so  that  then 

H(U)  - H(X)  _<  I(U;S+N)  - I(X;Z).  (4.4.5) 


86 


AD-A072  259  CALIFORNIA  UNIV  LOS  ANGELES  SCHOOL  OF  ENGINEERING  A— ETC  F/G  1/2 
ENTROPHY  ANALYSIS  OF  FEEDBACK  FLI6HT  DYNAMIC  CONTROL  SYSTEMS. (U) 

JAN  79  H L WE I DEM ANN*  C T LEONDES  F33615-77-C-3013 

UNCLASSIFIED  AFFDL-TR-78-123  NL 


111  1 i ■ 

Since  only  H(X)  and  I(X;2)  have  a functional  dependence  on 
F(Z)  it  must  follow  that  minimizing  H(X)  must  be  identical  to 
minimizing  I(X;Z).  Since  I(X;Z)  is  always  non-negative  the  lower 
bound  on  being  able  to  use  a feedback  function,  F(Z) , to  reduce  H(X) , 
must  be  the  Sensor  Channel  Transmittance,  I(U;S+N). 

Using  equation  (2.7.10) 

I(U;S+N)  = H(S+N)  - H(N) , 
equation  (4.4.5)  becomes 

H(U)  - H(X)  <_H(S+N)  - H(N)  = r (4.3.5) 

with  equality  if  g2^— +— ^ 311  information  preserving  transformation 

and  if  F(Z)  is  chosen  to  achieve 

p(X,Z)  = px(X)  pz(Z). 

Thus,  an  inequality  for  the  joint  entropy  of  the  error  coordinates 
has  been  found  which  is  true  for  any  feedback  element,  F,  and  does 
not  depend  on  knowing  the  feedback  gain  function  in  order  to  evaluate 
it.  The  constant  rQ  is  a function  of  the  parameters  of  the  sensor 
and  the  statistical  properties  of  the  input  signal  and  immediately 
sets  the  lower  bound  on  the  system  entropy  performance,  irrespective 
of  how  sophisticated  the  feedback  optimization  is  made. 

Thus  it  has  been  proven  that  the  effectiveness  of  a feedback 
loop  in  reducing  system  errors  is  determined  by  the  ability  of  the 
sensor  to  transmit  information.  This  is  an  entirely  new  way  for 
looking  at  feedback  control  systems.  It  is  the  first  time  that  the 


87 


entropy  flow  of  signals  around  a feedback  loop  has  been  examined, 
and  the  first  time  that  it  has  been  proven  that  system  performance 
is  determined  by  the  entropy  handling  capabilities  of  the  components. 


Rearranging  the  terms  of  equation  4.3.3  and  using 
I(X;Z)  _>  0, 
leads  to 

H(X)  H(Y/W) . 

This  resembles  previously  obtained  conditional  entropy  expressions 
and  it  also  sets  a lower  bound  on  the  possible  system  error  entropy. 

It  is  disappointing  that  this  result  can  not  bridge  the  gap  to  the 
real  time  Gaussian  linear  solution  as  it  was  able  to  do  for  the 
estimation  problems.  The  reason  is  real  time  data  processing 
considerations  are  such  that  H(X)  can  not  be  caused  to  be  equal  to 
H(Y/W). 

Actually  the  question  of  achieving  the  bound  for  physically 

realizable  systems  is  the  only  disenchanting  facet  of  this  theorem. 

The  requirement  for  equality  in  equation  4.4.5  is  that  I_(X;W)  (or 

I(X;Z))  be  equal  to  zero.  Of  course,  this  is  just  the  condition  that 

the  function  F(Z)  be  chosen  so  that  the  measurements  Z are  independent 

of  the  error  X.  In  a realizable  closed  loop  control  system  the  error 
~ 

is  generated  in  real  time  because  the  only  data  that  may  be  taken 
by  the  sensor  is  data  about  the  errors  and  without  data  there  cannot 
be  errors,  etc.  Therefore  it  is  impossible  to  cause  past  coordinates 
of  the  error  vector  to  be  indpendent  of  present  or  future  measurements, 


88 


using  a physically  realizable  filter,  F,  and  H(X)  can  not  be  equal 
to  H . This  disadvantage  in  no  way  impairs  the  validity  of  the 

C 

bounds  derived  non-realizable  systems  but  it  does  reduce  the 
usefulness  of  the  theorem.  The  truly  useful  result  must  have  a 
tighter  bound  and  it  must  be  attainable.  Obtaining  this  type  of 
result  is  considered  so  important  that  the  contents  of  Chapter  Six 
will  be  devoted  to  deriving  the  real  time  solutions  for  both  the  feed 
back  and  the  estimation  problems  and  to  examining  the  properties 
of  that  solution. 


r 


CHAPTER  FIVE 
THE  TRACKING  PROBLEM 

5.1  Introduction 

As  already  demonstrated  in  Chapter  Four,  the  most  significant 
application  of  entropy  analysis  is  for  the  study  of  feedback  control 
systems.  Through  the  use  of  Channel  Transmittance  concepts,  entropy 
analysis  provides  an  entirely  new  interpretation  of  the  benefits  and 
limitations  on  the  use  of  feedback  to  improve  system  performance. 
This  interpretation  is  in  complete  agreement  with  conventional 
verbal  descriptions  of  such  systems  but  until  now,  it  has  not  had  a 
suitable  mathematical  foundation. 

This  chapter  continues  the  study  of  entropy  analysis  by 
examining  the  performance  of  a closed  loop  tracking  system.  Results 
are  again  derived  which  demonstrate  that  the  performance  improvement 
of  a tracking  system  utilizing  a feedback  loop  over  that  of  an  open 
loop  system  is  directly  limited  by  the  Channel  Transmittance  of  the 
system  sensor. 

In  this  case  the  performance  bounds  can  not  be  achieved  and 
therefore  are  not  as  tight  as  would  be  desired.  However  the  experi- 
ence gained  frcm  the  present  limited  approach  paves  the  way  for  the 
sharper  results  presented  in  the  next  chapter  and  those  results 
justify  the  continued  emphasis  of  this  dissertation  on  Channel 
Transmittance  as  a fundamental  concept  in  estimation  and  feedback 
control. 


90 


5.2  The  Tracking  (or  Servomechanism)  Problem;  Description 


The  procedure  of  examining  the  mutual  information  between  the 
error  and  the  measurement  quantities  is  applicable  to  problems 
involving  noisy  measurements  of  signals  in  either  open  or  closed 
loop  configurations.  In  this  dissertation  the  (closed  loop) 
servomechanism  is  the  last  class  of  system  to  be  studied. 

The  main  purpose  of  a servomechanism  is  to  cause  the  output 
to  follow,  or  track  accurately  an  input  signal.  This  is  usually 
accomplished  by  sensing  the  difference  between  a function  of  the 
output  and  the  input,  filtering  this  quantity  and  then  applying  it 
to  the  plant.  This  configuration  is  shown  in  Figure  5.1.  The  most 
important  aspect  of  the  system  as  considered  here,  is  the  identification 
of  certain  components  of  the  system  as  being  elements  of  a measure- 
ment device.  The  form  chosen  for  this  sensor  is  general  enough  to 
include  the  major  characteristics  of  feedback  devices  ranging  from 
simple  potentiometer  pickoffs  to  gyros  and  human  operators. 

As  far  as  the  sensor  components  are  concerned,  B must  be  a 
linear  element  and  the  noise  must  be  additive.  The  filters  C and 
R may  be  non-linear,  as  long  as  they  are  information  preserving 
in  the  sense  of  Section  2.4.  The  optimizing  filter  and  all  fixed 
elements  of  the  feedback  path  may  also  be  included  into  R.  The 
fixed  part  of  the  system  is  D,  and  P is  a filter  which  may  be 
added  because  of  the  auxiliary  system  requirements  not  germane  to 
the  tracking  problem.  The  only  constraints  on  P and  D are  that  they 
be  information  preserving. 


91 


Figure  5.1.  The  Tracking  Problem  (Servomechanism) . 


The  entropy  analysis  of  the  feedback  tracking  system  is  simplified 
through  the  use  of  the  following  lemma. 


For  closed  loop  tracking  systems  of  the  form  shown  in  Figure  5.1 
having  an  arbitrary  sensor  configuration,  the  joint  entropy  of  the 
signal  and  the  measurements  equals  the  joint  entropy  of  the  error 


pv  = Pv  2<xmz>,z) 


This  lermia  will  now  be  used  to  prove  the  entropy  theorem  for  servo- 
mechanisms. The  major  result  of  this  theorem  is  that  the  improvement 
in  system  error  performance  because  of  the  use  of  feedback  is  limited 


For  the  tracking  problem  shown  in  Figure  5.1,  where  B is  linear 
C information  preserving  and  the  feedback  gain,  F(Z)  is  arbitrary; 


f 


f 


1.  Minimizing  the  mutual  information  I(X,W)  is  equivalent  to 
minimizing  the  entropy  of  the  error  vector. 

2.  The  entropy  of  the  error  vector,  H(X),  always  satisfies  the 
inequality 

H(X)  <_  H(Y)  - I(Y;BY+N). 

3.  Since  I(y;BY+N)  is  the  open  loop  Sensor  Channel  Transmittance 
it  also  follows  that 


H(Y)  - HCX)  v H(BY+N)  - H(N) . 


Proof : 


The  proof  for  this  theorem  follows  the  same  pattern  as  the 
proof  for  the  regulator  theorem  (Section  4.4).  The  mutual  information 
between  the  error  X and  the  noisy  measurement  quantity  Z is  I(X;Z), 
which  may  be  written  as 

I(X;Z)  = H(X)  + H(Z)  - H(X,Z) 

= H(X)  - H(Y)  + H(Y)  + H(Z)  - H(Y,Z) 

H(Y)  - H(X)  = I(Y;Z)  - I(X;Z). 

I(Y;Z)  is  the  mutual  information  between  the  signal  Y and  the  closed 
loop  measurements  Z,  and  unless  F(Z)  is  given,  it  can  not  be  calculated 
or  measured.  However,  when  the  sensor  is  specifically  constrained 
to  have  linear  prefiltering  and  additive  noise  it  follows  that 

Z = C(BY+N  - BF(Z) ) 

= g?(BY+N)  = gi(C(BY+N). 


According  to  Section  4.3,  g^C* ) is  a single  value  function  so 
I(Y;Z)  <_  KYiBY+N). 

If  gjCBY+N)  is  an  information  preserving  transformation  then  this 
last  equation  is  an  equality. 

H(Y)  - H(X)  < I(Y;BY+N)  - I(X;Z) 

only  H(X)  and  I(XjZ)  are  functions  of  FCZ),  so  minimizing  I(X;Z)  mini- 
mizes H(X) . Using 

I(XjZ)  >_  0 

and 

I(Y;BY+N)  = H(BY+N)  - H(N) 
leads  to  the  final  equation 

HCY)  - H(X)  < H(BY+N)  - H(N). 


Q.E.D. 


! 


CHAPTER  SIX 

REAL  TIME  DATA  PROCESSING 

6.1  Sumary 

An  unfortunate  limitation  of  the  theorems  of  the  preceding 
chapters  is  that,  in  order  to  achieve  the  lower  bound,  the  processor 
must  have  access  to  the  entire  time  history  of  the  sensor  output 
before  it  estimates  the  signal.  This  delay,  which  is  required  to 
make  all  possible  information  available,  is  not  a serious  defect 
for  some  categories  of  signal  estimation,  but  it  is  disastrous  in 
any  real  time  estimation  problem  and  catastrophic  for  feedback 
control  systems  where  the  presence  of  even  small  lags  carries  a 
high  penalty. 

Of  course,  this  in  no  way  impairs  the  validity  of  the  bounds 
derived  above,  but  their  shortcomings  due  to  the  time  lag  require- 
ment leads  to  the  belief  that  in  real  time  (or  sequential)  situations 
even  tighter  bounds  may  exist.  If  such  a tightened  bound  exists, 
it  is  certainly  obvious  that  the  real  time  data  processing  system, 
where  the  information  gained  because  of  the  new  measurements  can  no 
longer  be  used  to  correct  the  previous  errors,  can  have  a performance 
no  better  than  the  total  delay  system,  and  probably  has  a poorer 
performance . 

The  work  in  this  chapter  develops  the  theory  of  sequential 
information  and  shows  how  this  entirely  new  concept  adapts  itself 
readily  to  the  real  time  pecularities  of  closed  loop  feedback  systems. 
The  first  step  is  to  define  the  Sequential  Channel  Transmittance  of 


, j 


96 


! 

I 


the  sensor.  This  property  measures  the  sequential  cumulative 
acquisition  of  information  about  the  signal  as  measurements  are 
made.  This  definition  is  contrived  so  that  it  completely  discounts 
information  derived  from  the  new  measurements  about  past  signed 
samples.  Of  course  this  is  what  must  be  done  in  any  reed  time 
processing  system  if  it  is  to  remain  physically  realizable. 

The  next  step  is  to  consider  the  Incremental  Sequential  Channel 
Transmittance , i .e . , the  amount  of  new  information  derived  from  one 
new  measurement.  The  sum  of  all  the  incremental  transmittances  up 
to  sane  time  is  obviously  the  Sequential  Channel  Transmittance. 

The  incremental  quantity  meets  two  important  needs . First , it 
measures  the  importance  of  new  measurements.  Secondly,  if  this 
quantity  approaches  zero,  it  provides  an  indication  of  what  the 
asymptotic  Sequential  Channel  Transmittance  will  be  and  how  fast 
the  asymptotic  limit  is  approached.  This  asymptotic  Sequential 
Transmittance  is  the  irreducible  uncertainty  of  the  signal  and  no 
amount  of  additional  measurements  and  data  processing  can  decrease 
the  conditioned  signal  entropy  below  this  level,  even  if  the  data 
processing  is  optimum  in  the  entropy  sense. 

Knowing  that  the  total  delay  system  performance  of  an  estimation 
system  is  limited  by  Channel  Transmittance  as  shown  by  Theorem  3.4, 
and  suspecting  that  Sequential  Channel  Transmittance  plays  an 
analogous  role  in  real  time  systems,  it  seems  reasonable  to  assume 
that  a theorem  analogous  to  Theorem  3.4  can  be  proven  for  the  real 
time  estimation  case.  It  turns  out  that  such  a theorem  can  be  proven 


97 


by  direct  application  of  the  following  result,  which  will  be  proved 
as  a lemma  later. 

"The  joint  entropy  of  the  error  and  the  measurements  equals 
the  joint  entropy  of  signal  and  the  measurements." 

Similarly,  for  real  time  closed  loop  feedback  systems,  an  analogous 
lenma  and  an  analogous  sequential  theorem  can  be  proven.  For  both 
the  estimation  and  the  feedback  problem  the  entropy  bounds  are  tight 
(i.e.,  achieveable)  so  that  optimum  realizable  performance  may  be 
predicted  and  obtained.  In  the  special  Gaussian- linear  case,  the 
mathematical  representations  of  the  theorems  for  both  types  of  systems 
admit  simplifications  which  reduce  them  to  the  same  results  as 
obtained  by  conventional  Gaussian-linear  analysis,  which  shows  the 
complete  generality  of  the  new  channel  approach  to  system  analysis. 

6.2  The  Sequential  Channel 

In  many  situations  it  is  desirable  to  know  the  Sequential  Channel 
Transmittance  of  a sensor,  which  will  be  denoted  by  I(yk;Z)  and  which 
is  defined  by  the  relationship 

I(yk;Z>  = H(Z)  - H(Z/yk). 

By  continuing  to  interpret  the  sensor  as  a measurement  device,  it  is 
seen  that  this  quantity  represents  the  information  between  the  last 
input  and  all  the  previous  measurements.  It  is  easy  to  see  that 

I(yk’-)  - I(yk;zk)»  v*lich  is  the  information  between  the  last  input 
and  the  last  output.  This  implies  that  even  though  some  of  the 
measurements  z^,  i ^ k were  not  made  of  the  y^  variable,  z.  is 


98 


J 


is  still  statistically  related  to  yk  and  can  therefore  provide 
useful  information  about  yk. 

The  Sequential  Channel  Transmittance  is  different  than  the 
Channel  Transmittance,  I(Y;Z),  previously  defined.  In  the  previous 
case,  Channel  Transmittance  is  the  information  between  all  the  inputs 
and  all  the  measurements.  This  quantity  implies  that,  in  some  way, 
time  nay  be  made  to  stand  still  until  all  the  components  of  Y are 
measured,  and  then  all  the  information  in  Z about  Y may  be  utilized 
for  whatever  end  is  desired.  Sequential  Channel  Transmittance 
implies  just  the  opposite  and  enphasizes  the  inevitable  march  of 
time,  and  the  uselessness  of  a measurement  that  comes  too  late. 

Control  systems  are  an  example  of  how  the  "arrow  of  time" 
renders  seme  information  useless,  for  no  matter  how  much  additional 
measurements  are  related  (in  a statistical  sense)  to  previous  inputs 
it  is  too  late  to  make  use  of  the  information. 

When  C is  an  information  preserving  transformation,  it  follows 
that  for  the  sensor  shown  in  Figure  6.1 

I(yk;Z)  = I(yk;W)  = H(W)  - H(W/yk>. 

since  H(W)  = H(BY+N),  in  order  to  determine  I(yk;Z)  it  remains  to  deter- 
mine the  conditional  entropy  of  W given  yk*  A useful  relationship 
between  the  respective  probability  density  functions  is 

Puy'S-P  * Py<I>  P„(“-BI>- 


| 


Then 


where  in  the  last  integral  the  notation 

t 


rk  s yk 
and 

' <drk-lKdrk-2)"'(drl> 
has  been  used. 

6.3  Incremental  Sequential  Transmittance 

The  Incremental  Sequential  Transmittance  is  A^,  where  the 
definition  of  this  quantity  is 

AK  ' I<yKi^K)  “ I(yK-li^K-l)‘ 

In  the  stationary,  time  invar ient  case, 

I(yK-li^K-l)  - ^K-l’^K-l*  Zo>  ' I(yK’^K) 

therefore,  Ky^jZ^)  is  a monotonically  increasing  function  with  K. 
expand  Ky^iZ^)  according  to  property  16,  Table  I,  Section  2.2, 

Ky^Zg)  = H(yK)  - HCy^). 

When  the  processes  are  stationary  and  y is  generated  by  a markoff 
source, 


101 


H(yK>  = H(y) 

and,  according  to  Theorem  2.2.2, 

LIM  H(yR/ZK)  H(y-/ZJ 
K 

v^ich  implies  that  the  Sequential  Channel  Transmittance  approaches  a 
steady-state  value.  Moreover,  it  follows  that 

UM  A..  = 0 . 

K-*»  K 

The  interpretation  of  this  result  is  that  even  though  the  sequential 
information  about  y contained  in  the  measurements  is  increasing,  i.e., 
the  entropy  of  y conditioned  on  the  measurements  is  decreasing,  the 
effectiveness  of  the  oldest  measurements  to  provide  information  about 
the  latest  signal  is  decreasing  to  zero. 

6«4  The  Sequential  Entropy  Theorem  for  Estimation 

6.4.1  Introduction 

The  total  vector  derivations  made  in  Chapter  Three  have  hinted 
strongly  that  a real  time  entropy  solution  for  the  problem  exists, 
and  experience  indicates  that  that  solution  will  be  a function  of  the 
real  time  capability  of  the  sensor  to  transmit  information,  i.e.,  the 
Sensor  Sequential  Channel  Transmittance  property.  But  before  this 
aspect  of  the  problem  is  investigated,  it  is  convenient  to  prove  the 
following  lemma  which  will  be  used  in  the  proof  of  the  sequential 
estimation  theorem. 


102 


».r. 


Lerroa 


For  the  signal  estimating  system  shown  in  Figure  6.2,  where 
C is  information  preserving,  the  joint  entropy  of  the  error  and  the 
measurements  is  equal  to  the  joint  entropy  of  the  processed  signal 
and  the  measurements,  i.e. , 


H(xk,W)  = H(uJ<,W) 


(6.4.1) 


Proof: 

The  proof  will  proceed  by  examining  H(x^,W)  and  HCu^.W)  separately 
reducing  the  expressions  so  that  they  are  a function  of  the  fundamental 
quantities  py(Y)  and  pn(N) , and  then  shewing  that  the  two  expressions 
are  indeed  equal. 

The  formal  definition  of  H(x^ ,W)  is: 


h(xk,W)  = - y"  dxK  |dW  p(xK,W)  LOG  p(xK,W) 


—.00  —00 


00  00 

- ~ J dX  J dW  (X,W)  LOG  p(xK,W). 


(6.4.2) 


Using  a previously  derived  expression  for  p (X,W) , namely  equation 


(3.4.9), 


P^X.W)  = PV(I)  Pn(N)  lDEr  [3V  D(X)]I 


and  the  system  equations 


D(Y)  = X + Fc(W) 


N = W - BD"a  (X+Fc(W)  . 


(6.4.3) 


(6.4.4a) 


(6.4.4b) 


It  follows  that: 


[3yD(Y)]_1  = 


3D_1(X+Fc(W)) 

"WW 

— c — 


(6.4.5) 


(6.4.6) 


Using  the  probability  formula  and  making  the  change  of  variable 
Y = D'1(X+Fc(W) 

the  joint  entropy  of  the  scalar  xv,  and  the  vector  W becomes 


H(xk,W)  = 


dWpy(Y)  Pn(W-BY)  LOG  [P]. 


(6.4.7) 


PET 


(6.4.8) 


The  important  term  in  equation  (6.4.7)  (and  also  the  most  difficult 
term  to  handle)  is  the  argunent  of  the  logarithm  and  it  will  now  be 
studied  separately . 


105 


The  following  simplifying  vector  notation  is  used: 


X s COL  (x^x^,. . . ,Xj^}  (6.4.9a) 

^K-l  s ^xl*x2’* * * ,XK-1^  (6.49b) 

xK  = d^Y)  - Fc  (BY*N)  = d^Y)  - Fc  (W)  (6.4.9c) 

K CK 

*^-1  = ^*1  ^*2  •*,dxK-l  (6.4. 9d) 

The  quantity  [P]  can  be  simplified  by  making  the  change  of  variable: 

£ * D"1  <XTC(W»  (6.4.10) 

or  using  X as  a function  of  R this  transformation  is : 

X . D(R)  - FC(W)  . (6.4.11) 

Then 


[P]  = J d Rk_1  py(R)  pn(W-BR)  |dET  D(R)]'1  DETta*”1  DtR^)] 

—CD 

(6.4.12) 


where 


3 


K 

y 


D = 


3dx(Y)  ad^Y) 


3dK^ 

3yl 


3dK(Y) 


(6.4.13) 


is  the  K dimensional  Jacobian  of  the  transformation  U = D(Y)  and 


106 


(6.4.14) 


^ D . 


*1  ^K-l 


is  the  (K-l)  dimensional  Jacobian  of  the  transformation  1=D(YJ( 
with  D(Y)  defined  as 


D(Y)  = 


dl(Y) 


When  the  simplified  expression  for  [P]  is  used  in  the  joint  entropy 
H(VW)  it  becomes:  * 


h<xk,W)  = - J dY  dW  py(Y)  Pn(W-BY)  LOG Ij  dJR^_1  py(R)  Pn(W-BR] 


|det  [a*  D]-1  DET  [a*'1  D]| 


(6.4.15) 


Having  derived  this  expression,  it  is  now  convenient  to  shift  attention 
and  derive  a similar  (in  fact  an  identical)  expression  for  H(uK,W). 

Consideration  of  H(u^,W)  is  based  on  examining  the  transformation 
between  [Y,N]  and  [W,U],  i.e., 

W = BY  ♦ N 


U = D(Y) 


The  Jacobian  of  this  change  of  variables  is  given  symbolically  as: 


J = |DET  3y  D(Y)|  = |DET  j’  |H  J | (6.4.16) 

so  that  the  relationships  between  the  various  probability  density 
functions  are 

Pwu(W,U)  dW  dU  = py(Y)  pn(N)  dY  dN  (6.4.17) 

p^/W,!!)  = p (D-1(U) ) p (W-BD_1(U))  |DLT  3D'1  (U)l  . (6.4.16) 

The  joint  entropy  of  u^  and  W,  in  terms  of  py( • ) and  p^( • ) is  found 
by  using  the  probability  formulas  in  the  entropy  definition 


OB  » (6 

H<VW)  = - j dU  /dW  puw  (U,W)  LOG  J d^  piiw  (£,W), 

_ 00  — oo  _oo 


to  yield 


80  OP 

Wi^.W)  = - J dU  f dW  |DET  [3^  D-1(U)]|  py(D-1(U))  Pn(W-BD-1(U) ) 

■•(O  —CO 

oo 

LOG  J dg^  |DET  [3^  D-1(2)]|  py(P_1Ui) )pn(W-BD-1(^) ) 


(6.4.19) 


where  Q is  a duimy  variable  of  integration  and  q^=u^.  It  is  important 
not  to  confuse  the  variables  of  the  expectation  integral  with  the 
variables  of  the  integration  performed  within  the  LOG  argument  in  order 
to  achieve  the  necessary  rrarginal  probability  distribution.  In  the 
expectation  integral  make  the  change  of  variable  defined  by: 


' 


, w—- 


1 

1 


D_1(U)  s Y 
D_1(g)  * o 


where  q^u^c^.. 

With  these  new  variables 


dU  (DCT  [3*  D_1(U) ] |=  dY 


[3*  D-1(a>3  = U*  D(o)]"1 


^Sk-i  = d2K-l  JDET  *-3a  1 D*Hk-1^  1 ' 


It  then  follows  that: 


h(uk,W)  = J dY  J dW  py(Y)  Pn(W-BY) 


(6.4.20) 

(6.4.21) 


(6.4.22) 

(6.4.23) 

(6.4.24) 


LOG 


J d2*-l  Py(SL)  Pn(^B2L) 


(DET  | 

!■ 

|DET 

3a  Wa>  ] 

I 

(6.4.25) 

This  expression  is  identical  to  the  expansion  for  H(xK,W)  therefore 
H(VW)  = h<xk,W)  (6.4.26) 

and  the  leimwt  is  proved. 


109 


— -niniimt 


6.4.2  Proof  of  the  Theorem  for  Sequential  Estimation 


1.  For  the  general  estimation  problem  shown  in  Figure  6.2,  with 
arbitrary  F(Z)  and  C information  preserving,  the  entropy  of  single 
scalar  error  always  satisfies  the  equality 


and  the  inequality 


H(xk)  _>  H(dKY)  ♦ H(BY+N/dKY)  - H(BY+N)  = H 


2.  The  system  performance  improvement  due  to  feed- forward 
estimation  is  limited  by  the  Sensor  Sequential  Channel  Transmittance 


H(d„Y)  - H(x„)  < H(BY+N)  - H(BY+N/dvY) 


BY+N) . (6.4.29) 


3.  Minimizing  the  mutual  information  I(x^;W)  is  equivalent  to 
minimizing  the  error  entropy. 


4.  The  minimum  error  entropy  occurs  when  I(xk,  ;W)  = 0 and  is 


Ho  = H(dKX)  + H(BY+N/dKY)  - HCBY+N) 


This  value  is  attained  if  the  optimum  filter  F is  chosen  so  that  the 
most  recent  error,  xK,  is  independent  of  all  the  previous  measure- 


The  proof,  although  more  involved  than  those  studied  previously 
follows  the  format  that  has  been  successfully  applied  to  the  total 


error  vector  case.  As  before,  begin  by  examining  the  mutual  information 
of  the  scalar  xK  and  the  vector  W. 

KXj^iW)  = H(xk)  ♦ H(W)  - H(W,x^)  (6.4.31) 

H(W)  is  H(BY+N) , so  the  only  problem  is  to  determine  H(W,xK>, 

H(xk,W)  = -Ku^W)  ♦ H(W)  ♦ H<uk)  s Wu^-.W).  (6.4.32) 

This  last  equation  can  now  be  introduced  into  the  mutual  information 
I<XK»W^  to  yield  the  basic  result  of  this  theorem, 

KXk'.W)  = H(xk)  ♦ H(W)  + HUj^W)  - H(W)  - H(u)<) 

I(xK;W)  = H(xk)  - H(uk)  + Ku^;W)  . (6.4.33) 

Rearranging  this  equation  produces 

H(u^)  - H(xk)  = Ku^iW)  - I(xK;W)  (6.4.34) 

and 

H<V  - H(xk)  <_  KUj^iW)  (6.4.35) 

and  the  theorem  is  proved. 

The  quantity  Hu^jW)  is  the  mutual  information  between  a sample 
of  the  processed  signal  uK  and  the  measurements  of  Y.  It  appears 
that  the  ability  of  feed-forward  sequential  estimation  to  reduce 
the  uncontrolled  entropy  of  a processed  signal  is  limited  by  the 
amount  of  information  provided  about  the  processed  signal  due  to 
measurements  of  the  signal.  This  observation  remains  true  even  when 
the  sensor  does  not  have  the  specific  form  given  in  Figure  6.2.  The 
proof  of  this  is  given  in  the  following  corollary. 


If  a signal  y^,  which  is  the  last  element  of  the  sequence, 
y1»y2.-*-.yk»---»yK*  measure<1  by  a sensor  having  no  other 
description  other  than  its  processed  Channel  Transmittance,  I(u^;Z) , 
then  the  entropy  of  the  error  H(x}<) , in  estimating  dK(Y)  is  given 


H(xk)  = H(dK<Y))  + I(xK;Z)  - I(dK(Y);Z)  . 


(6.4.3' 


If  f^(Z)  is  chosen  to  achieve 


I(xK;Z)  = 0 


MIN  (H(xv)}  = H(xv)  = H(dv(Y)  ) - I(d„(Y);Z)  . (6.4.37) 

r~  /n  \ J>  In  * I'  ““  *“ 


V* 


Note  that  neither  d^(Y)  or  f^CZ^)  are  constrained  to  preserve  information. 


Proof: 


since 


Figure  6.3  yields  the  system  equations 


xK  = dK(Y)  - fK(Z) 


Z = OY  + Z 


- V 

the  joint  probability  density  function  of  x^  and  Z is 
pxz  (xK S puz  (xK+fK(->  ’-)  » 


from  which  it  follows  that 


H(xk,Z)  = H(uK,Z)  = H(dK(Y),Z). 


Figure  6.3.  The  Real  Time  Estimation  Problem 
with  Generalized  Sensor. 


113 


This  entropy  equation  is  true  regardless  of  the  nature  of  f^CZ)  and 
no  constraints  as  to  information  preservation  are  implied.  The 
mutual  information  between  the  error  and  the  measurements  is 

I(xK;Z)  = H(xk>  ♦ H(Z)  - H(xk,Z) 

= H(xk)  - H(dK(Y))  + [H(dK(Y))  + H(Z)  - HCd^(Y) , Z)  ] 

= H(xk)  - H(dK(Y)  + I(dK(Y);Z) 

which  proves  the  corollary. 

# 

Depending  on  Ku^jZ)  this  result  is  not  as  "clean"  as  the  one 
for  the  non-real  time  estimation  problem  where  the  system  performance 
was  limited  directly  by  the  Sensor  Channel  Transmittance.  The  discrep- 
ancy comes  about  because  of  the  real  time  requirements  on  the  estimating 
procedure.  The  scalar  may  be  a function  of  all  the  past  signal 
samples,  y^,  but  Sequential  Channel  Transmittance  is  defined  in 
terms  of  only  the  latest  signal  sample,  y^.  A modified  sensor  such 
as  shown  in  Figure  6.4  may  be  defined  if  D is  an  information  preserving 
transformation.  The  change  in  system  performance  of  this  new  system, 
due  to  feed- forward  estimation,  is  new  limited  by  the  modified  sensor 
Sequential  Channel  Transmittance.* 

When  no  reliable  analytical  descriptions  of  the  system  components 
are  available,  the  conclusion  of  Corollary  I implies  that  it  would  be 

"The  modification  of  the  system  in  this  manner  is  not  entirely  analogous 
to  the  situation  that  results  frem  taking  D as  the  identity  operator 
but  bears  a resemblance  to  that  approach  in  that  if  D=I , channel 
capacity  is  again  the  performance  limiting  factor. 


114 


Figure  6.5.  A Recommenced  Experiment. 


Figure  6.4.  A Modified  Estimation  Problem. 


The  cal- 
culation of 
mutual 
infomat  ion 


115 


useful  to  carry  out  the  open  loop  experiment  in  Figure  6.5.  The 
result  of  this  empirical  investigation,  I(u^;Z) , can  then  be  properly 
utilized  in  the  preliminary  design  of  feed-forward  estimators  because 
it  is  the  upper  limit  on  system  entropy  improvement. 


While  the  performance  bound  is  not  totally  pleasing  since  it 
depends  on  Ku^jZ)  and  not  on  the  Sensor  Channel  Transmittance, 

I(yK;Z),  it  is  mathematically  sound  and  intuitively  correct  and  it  does 
lead  to  the  following  important  corollary. 


6.5  Corollary  II  to  the  Sequential  Estimation  Theorem 


Corollary  II: 

When  the  causal  optimum  filter  is  chosen  so  that  the  most  recent 
error  is  independent  of  all  measurements,  the  entropy  of  the  error 
is  equal  to  the  conditional  entropy  of  the  processed  signal  (con- 
ditioned on  all  previous  measurements),  i.e., 


H(xk)  = Mu^/W)  . (6.5.1) 

Proof: 

The  optimum  filter  is  chosen  so  as  to  cause 


I(xK;W)  = 0. 

This  causes 

H^)  = H(uK)  - I(u^;W)  ♦ I(xK;W) 

(6.5.2) 

to  become 

H(x]<)  = Mx^)  - Ku^W) 

(6.5.3) 

= Wu^)  - tH(uR)  - H(uK/W)] 

(6.5.4) 

H(xk>  = H(u^/W) 

Q.E.D. 

The  uncertainty  in  x^  is  the  uncertainty  of  the  processed  input 
signal,  given  all  the  measurements. 

Theorem  2.2.2  shows  that  when  u^  is  a narkoff  process  and  is 
stationary,  H(u^/W)  is  monotonically  decreasing  and  approaches  a 
finite  value  and  therefore 


H(x-)  <_  <_  H(xK1> 


•V-  K 


(6.5.5) 


Equation  (6.5.5)  is  interesting  because  it  demonstrates  that  under 
the  stated  conditions  the  error  entropy  is  monotonically  decreasing 
and  approaches  a finite  steady-state  value  given  as 

H(x  ) = H(u  /W  ) 

00  00 

It  is  interesting  to  note  that  this  result  is  not  true  in  general. 
Under  certain  conditions  the  entropy  of  x can  decrease  without  limit. 
For  example,  if  the  signal  y were  a constant  then  after  a sufficiently 
long  time  the  estimate  of  y derived  from  noisy  measurements  would 
have  arbitrarily  small  error  and  the  entropy  of  that  error  would  have 
no  lower  bound. 

6.6  Gaussian  Example 

The  sample  problem  described  in  Section  3.6  is  again  examined. 
Contrary  to  the  previous  examples,  the  entropy  solution  to  the  real 
tune  Gaussian  estimation  of  a signal  in  additive  noise  is  not  very 
simple.  This  is  because  the  Gaussian  mean  square  solution  to  the 
problem  is  not  very  simple. 

First  let  us  determine  the  minimum  variance  error  using  the 
classical  techniques  of  Gaussian  estimation  theory.  If  yK  is  the  last 





It  is  obvious  that 

4 - v cy  V1  [yRn]  i 

A 

where  I_  is  the  colunn  vector  with  zero  entries  in  all  but  the  Kth 
coordinate,  i.e., 

I = COL  {0,0,. ..,1}  . 

Using  this  expression  the  miniman  error  variance  beccmes: 

°2  * V «y*Rn]'1  «yRn>  i - RJy» 

since 

and 

Rn  EVT,T 

°2  * y [yvl  • 

Because  x is  a Gaussian  random  variable  its  entropy  is: 

H(xk)  = j LOG  (2iteo2)  . 

This  is  the  final  result  and  resembles  very  much,  in  form,  the  error 
found  for  the  non-casual  case  in  Section  3,6. 

The  entropy  theorem  on  read  time  estimation  applies,  with  the 
following  system  parameters: 

D = I 
B = I 
C = I 
Z = Y+N 


119 


Since  all  the  variables  are  Gaussian,  the  optimun  filter  is  linear, 
the  error  can  be  made  independent  of  the  measurements  and  the  equality 

H(xr)  = H(yK)  ♦ H(Y+N/yK)  - H(Y+N) 


H(Y*N/yK)  = H(Y+N,y  ) - H(y„) 


H(x  ) = H(Y+N,y.J  - H(Y+N) 


The  first  step  in  determining  this  solution  is  to  evaluate  H(Y+N,yw.) 


H(I*N»y*> 


dSK_i  py<a>pn(i+N-a) 


y*n,y 


Using  this  last  expression  the  term  py(£)  Pn(Z~2)  may  be  written  as 
the  product  of  three  terms,  two  which  may  be  identified  as  being 
Gaussian  probability  density  functions. 


where  G1  and  G~  are  Gausr.jan  , i bability  density  functions  defined 


Si3 


- 7 ca-yyv'V  ca-yyv,)'1!] 

© 


)K  DET  C^+Rjj1]"1 


and 

- 1 ? t] yy1* 

G2  = 2 - ,— - = Pz(Z> 

(2w)K/2\/dET  [R+Rj 

V y n 

and  DET  [R^R^l  DET  [Ry^+Rn^]  = DET  [R^+R^].  Since  is  independent 
of  §.»  and  since  the  distribution  of  Z = Y + N is  the  more  important 


one  it  follows  that 


H(Y+N,yK) 


dN  py(Y)  pn(N)  LOG 


G 


1 


+ 

Obviously  the  last  term  is  H(Y+N)  and  the  first  term  can  be  simplified 
as  follows.  G^  is  Gaussian  and  a function  of  the  K variables 

^1 *^2  * * * * ’^K 

d9j<-lGl  must  also  **  311(1  is  the  function  of  only  q^,  using 

loo 

I_  = COL  {0,0,...,!}  it  may  be  expressed  as: 


f pyCY>  pn(N)  UK 


In  terms  of  the  previous  derivations 

DET  [Fd^R-Vlj]  = DET  CRyyCRy*^)-1^) 

and 

I\(VBn)"1  - = VW'1 1 

moreover, 

E iCW'Vn''1  yJ)  = °y  - 2V<VIinrl  * VW'1  "Jy 

* 4 - vw'1  & 

* WW"1  'hN 

so  that  finally, 


HQMUyK)  = H(Y+N)  + | \ LOG  [(2tt)  DET  ERyY(Ry+ V'1  *£*  3 * 


since  y = LOGe  Ce),  the  error  entropy  at  the  Kth  instant  is  now 
found  to  be 

H(xk)  = ^L0g|~(2»*>  DET  rJJ]  J 

which  is,  of  course,  identical  to  the  solution  obtained  through  the 
use  of  conventional  mean  square  analysis. 

6.7  The  Sequential  Entropy  Theory  for  Feedback  Control  Systems 
6.7.1  Introduction 

The  feedback  problem  has  already  been  introduced  in  Chapter  Four. 
In  this  section  the  concept  of  Sequential  Channel  Transmittance  will 
be  applied  to  the  problem  in  order  to  determine  a real  time  [physically 
realizable]  solution.  The  following  lemma  will  be  a cornerstone 
of  the  Sequential  Entropy  Theorem  for  Feedback  Control  Systems . 

Lemma 

For  the  closed  loop  feedback  control  system  of  the  form  shown 
in  Figure  6.6  with  arbitrary  system  parameters  and  arbitrary  sensor 
configuration,  the  joint  entropy  of  the  open  loop  signal  and  the 
closed  loop  measurements  is  identically  equal  to  the  joint  entropy 
of  the  error  and  the  closed  loop  measurements, 

Htx^Z)  = HCu^Z)  (6.7.1) 


Proof: 


r 


1.  x = D<y  - v) 

2.  Z=C(B(X)+N) 

3.  V = F(Z) 

4.  S = B(DCY)) 

Tigure  6.6.  The  Sequential  Feedback  Regulator 


1 


i 


then 

[ *K  = \ F(^ 

and 

j P*,.  ‘VS*  * pu,Z<VdK  TW‘ 

The  joint  entropy  H(x^  Z)  is 


00  0* 

H(xK,-)  = / d-  J daK  Pxz  (aK’— 5 h03  p^Ta^TeT 


00  00 


daK  Pu/z  (aK+dK  F(i)/i)  Pz(— } 


h°6  pu/z(aK+dK  F(B)/8)  pz(6) 


then 

MXg.Z)  = HCu^Z). 

6.7.2  The  Real  Time  Feedback  Entropy  Theorem 

The  feedback  control  system  that  is  examined  is  shown  in  Figure  6.6 
The  components  are  all  constrained  to  preserve  information;  in  addition 
B and  D are  taken  as  linear.  No  constraints  are  placed  on  the  forms 
of  the  signals  other  than  the  existence  of  an  entropy  measure.  The 
following  theorem  is  applicable. 

Theorem  6.7 

1.  For  any  realization  of  the  filter  function  FCZ)  the  entropy 
of  a single  scalar  error  always  satisfies  the  equality 

I(xK;Z)  = Mx^  - HCt^)  + Ku^jZ)  , (6.7.3) 


and  the  inequality 

I(xK;Z)  < HC^)  - HO^)  + Ki^i  BDY+N)  (6.7.3) 

U = DY 

As  a consequence,  the  following  inequality  is  also  true, 

Mxk>  1 (Mi^)  " I(VBDI+N>  = »o  (6.7.4) 

where  Hq  is  an  open  loop  function  and  it  is  independent  of  F(Z) . 

2.  The  presence  of  the  term  Sensor  Sequential  Channel  Trans- 
mittance (Ku^;  BDY+N)  in  equation  (6.7.4))  implies  that  the  improve- 
ment in  the  system  performance,  because  of  the  use  of  feedback,  at 
least  in  the  case  of  an  additive  noise  sensor,  must  be  limited  by 
the  open  loop  Sensor  Sequential  Channel  Transmittance,  i.e. , 

IKi^)  - H(x)  <_  Ku^BDY+N). 

3.  Minimizing  the  mutual  information  I(x^;Z)  is  equivalent  to 
minimizing  the  entropy  of  the  error. 

4.  If  the  filter  function  F(Z)  is  chosen  so  that  the  most  recent 
error  x^,  is  independent  of  all  the  previous  measurements  Z,  then 

I(xK;Z)  = 0 

and  the  minimum  error  entropy, 

MIN  (H(xv)}  = H = I(u„;BDY+N)  (6.7.5) 

F(Z)  K ° ^ - 

is  achieved. 

Proof : 

With  the  aid  of  the  lemma  of  Section  6.7.1  the  proof  of  these 
statements  is  quite  simple.  Begin  with  the  mutual  information 


127 


between  the  error  x^  and  the  vector  Z 
HXj^Z)  = H(xk)  + H(Z)  - H(xk,Z) 

= H^)  + H(Z)  - H(uk,Z) 

= H(xk)  - Mi^)  + IUj^Z)  . 

Because  this  system  is  constrained  to  be  physically  realizable  the 
equation 

Z = C(BDY+N-BDF(Z) ) 

must  have  a unique  single  valued  solution  (see  Section  4.3)  given  as 
Z = g2(BDYrN). 

The  function  g2(‘)  is  a transformation  of  BDY+N,  the  input,  into  Z, 
the  closed  loop  measurements.  This  transformation  can  not  "create” 
information,  therefore 

I(V£)  1 I (u^; BDY+N). 

Ihis  leads  to 

Kxk;Z)  ^H(xk)  - H(uk)  + I(uk;BDY+N). 

There  is  equality  when  the  system  transformation  of  (BDY+N)  into  Z 
preserves  information. 

I (i^; BDY+N)  is  the  open  loop  Sensor  Sequential  Channel  Transmitance 
It  is  the  information  i*  tween  the  scalar  signal  uK  and  the  open  loop 
measurement  vector  BDY+N. 

Since  mutual  information  is  non-zero,  the  inequality 
H(xK)  1 H(uK)  “ I (Uj^; BDY+N) 


128 


I 

also  follows.  The  achievement  of  the  equality  obviously  corresponds 
to  minimum  error  entropy  and  this  must  correspond  to 

I(Xg;Z)  s 0 . (6.7,6) 

If  u^  is  interpreted  as  the  system  output  with  an  open  feedback 
path  then 

Hti^)  - H(xk> 

must  be  the  system  entropy  improvement  that  results  specifically 
from  the  use  of  feedback.  It  is  truly  interesting  to  note  that 
this  system  improvement  is  bounded  from  above  by  the  open  loop  sensor 
properties  of  the  device  that  measures  u^.  Without  a doubt  this  is 
a radical  new  approach  to  feedback  theory.  This  result  is  significant 
because  it  implies  that  optimum  performance  for  feedback  systems 
may  be  calculated  without  either 

1.  calculating  the  optimization  network,  or, 

2.  completing  the  feedback  path  and  determining  the 
closed  loop  signal  entropies. 

6.8  A Generalized  Entropy  Approach  to  Feedback  Control 

In  the  special  case  where  B and  D are  linear  and  g?  ( • ) is  infor- 
mation preserving  it  happens  that 
Ku^Z)  = Hu^BDY) 

but  then  it  is  also  true  that 
H(uk/Z)  = H(uk/BDY). 

If  this  equation  were  true  for  all  types  of  sensors  then  the  general, 
completely  nonlinear,  feedback  problem  would  be  solved. 


I 


It  is  not  apparent  at  this  time  how  the  linear  constraints  on 
B and  D in  the  feedback  control  problem  may  be  relaxed.  The  following 


f 


i 


theorem  is  proposed  as  a step  in  that  direction. 

Theorem  6.8 

If  a sensor  has  the  property  that  its  closed  loop  conditional 
entropy  equals  the  open  loop  conditional  entropy,  i.e.,  if 

H(xk/Z)  = Wu^)  (6.8.1) 

where  Z is  the  closed  loop  output  of  the  sensor  and  Z^  is  the  open 
loop  output  of  the  sensor  when  the  input  to  the  sensor  is  the  vector 
DY,  then 

H(dKP  ~ H(xk)  = I(^0iUK)  " I(V-  * (6.8.2) 

nz^u^)  is  the  open  loop  Sensor  Sequential  Channel  Transmittance. 

In  addition 

mi^)  - H(xk)  <1(^1^)  . (6.8.3) 

Note:  For  the  system  studied  in  Section  6.7, 

and 

Z = BDY+N. 

— o 

Therefore  it  is  obvious  from  the  preceding  work  that  at  least  the 
linear  prefiltering  sensor  with  additive  noise  satisfies  this 
conditional  entropy  constraint,  whether  or  not  any  significantly 
different  sensors  also  satisfy  this  constraint  is  not  known  yet. 


130 





Proof: 


Kx^Z) 


HCXj^)  - H(x^/Z) 

H(xk)  - HCi^/Z^ 

H(xk)  - H(u^)  + H(uk)  - H(uk/Zq) 
H(xk)  - Wi^)  - Huj^) 


or  finally 


H(u^)  - H(x^)  <_  Ku^;^)  Q.E.D. 

Thus,  under  the  assumptions  of  theorem  6.8  it  is  quite  sufficient  to 
consider  the  sensor  as  a device  for  transmitting  information  and  its 
capability  for  doing  so  is  described  completely  by  the  mutual  infor- 
mation between  the  input  and  output.  The  actual  form  for  the  sensor 
model  is  immaterial.  It  seems  reasonable  to  conjecture  that  this  is 
true  ii\  general,  but  the  proof  of  this  conjecture  has  not  been  attained 
for  arbitrary  sensors. 

In  the  real  time  feedback  problem,  the  entropy  solution  may  be 
shown  to  be  very  similar  in  form  to  results  obtained  for  the  Gauss ian- 
1 inear  system.  This  is  stated  as  a corollary  to  theorem  6.7. 


6.8.1  Corollary  to  Theorem  6.7 

For  any  feedback  system,  having  arbitrary  components  and  uncon- 
strained sensor  models,  the  error  entropy  is  always  bounded  by  the 
conditional  entropy  of  the  error  given  the  measurements,  i.e., 

H(xk)  >_  H(uk/Z)  (6.8.4) 

The  equality  holds  if  and  only  if 
I(xK;Z)  = 0 . 


131 


tm 


Proof: 


Equation  (6.7.3)  was  derived  without  any  constraints  being 
imposed  on  the  system  elements , 

I(x^;Z)  = Mx^  - Mv^)  ♦ I(u^,Z).  (6.8.5) 

Rearranging  this  equation  yields 

H(x)  = H(uk)  - H(uk)  ♦ H(ur/Z)  + I(xk;Z) 

since 

I(xK;Z)  >_  0 

the  corollary  is  proven. 


uhl  urun  riwA^ooxi^a 


7.1  Discussion  of  the  i' .ficulties  in  Solving  Time  Continuous 


Systems  with  Entrop 


Barring  consideration  of  the  fact  that  so  far  useful  results  for 
the  calculation  of  time  continuous  mutual  information  have  only  been 
obtained  when  the  processes  involved  are  Gaussian,  the  extension  of 
entropy  analysis,  as  described  in  the  previous  chapters  of  this 
dissertation,  to  time  continuous  systems  is  simultaneously  easy  and 
difficult.  It  is  easy  because  all  of  the  theorems  in  the  continuous 
case  are  proven  using  only  the  properties  of  mutual  information  and 
these  properties  should  hold  in  the  continuous  as  well  as  the  dis- 
crete case.  It  is  difficult  because  there  does  not  exist  a funda- 
mental understanding  of  continuous  time  entropy.  Undoubtedly,  the 
poor  understanding  of  the  properties  of  this  type  of  entropy  is  due 
in  no  small  part  to  the  fact  that  up  until  now  there  has  been  no 
practical  need  for  a continuous  time  entropy  measure.  Probably  the 
spectre  of  infinite  entropy  has  so  intimidated  researchers  that  they 
have  not  even  made  strong  attempts  to  find  useful  applications.  The 
following  very  simple  example  opens  up  the  possibility  of  just  such 
a useful  application.  The  reader  is  cautioned  to  realize  that, 
since  only  Gaussian  variables  can  be  studied,  the  results  only 
indicate  a potential  on  the  part  of  entropy  to  solve  certain  broader 
time  continuous  problems. 


The  problem  considered  is  the  estimation  of  the  magnitude  of  a 
Gaussian  random  D.  C.  signal  in  additive  stationary  Gaussian  white 


noise.  For  this  elementary  problem  the  entropies  of  all  the  individual 


signed  quantities  exist,  so  that  it  would  be  no  great  difficulty  to 
find  the  entropy  of  the  estimation  error,  term  by  term,  from  the 
familiar  equation: 

H(x(t))  = H(Dy(t))  + H(n(t) ) - H<By(t)+n(t)).  (7.1.1) 

However,  realistic  situations  exist  where  some  or  all  of  these 


entropies  can  not  be  found.  An  alternate  approach  is  suggested  by 
the  technique  used  in  the  proof  of  Theorem  3.5.  There  the  concept 
of  Channel  Transmittance  is  utilized  to  obtain  equation  (3.5.3), 
rewritten  here  for  time  continuous  processes, 

H(Dy(t))  - H(x(t))  <_  I(y(t)  ;z(t>)  (7.1.2) 

and  since  I(y;z)  almost  always  exists  and  can  be  found,  it  therefore 
follows  that  the  bound  in  improvement  in  the  error  entropy  uncertainty 
for  any  continuous  estimating  system  almost  always  may  be  calculated. 
Of  course,  no  absolute  measure  of  the  system  entropy  performance  can 
now  be  obtained  since  H(x)  can  not  be  considered  as  a reliable 
absolute  measure  in  continuous  time  situations,  and  no  other  absolute 
measure  of  the  error  will  result  from  analysis. 

7.2  Example 

Consider  the  D.  C.  signal 

s(t)  = a,  0 < t < T,  (7.2.1) 


[ 


"a"  is  a zero  mean  random  variable  that  is  constant  over  the  specified 
time  interval  and  it  has  a variance 


, , A 2 
var  {a}  = o . 


(7.2.2) 


Estimates  of  "a"  are  to  be  made  using  the  noisy  measurements  z(t), 
z(t)  = a + n(t) , 0 < t < T, 

the  random  signal  n(t)  is  a zero  mean  stationary  white  noise  process 
independent  of  "a"  and  having  the  covariance  function 


E (ntt^)  n(t2>>  = Nq  6(t2~t1>. 


(7.2.3) 


If  "a"  is  estimated  using 
T 

a = h(T-r)  z(t)  dt  . (7.2.4) 

Then  according  to  the  Wiener-Hopf  equation  [42,  p.408],  the  optimum 
estimating  filter  must  satisfy 


-I 

J n 


E (a  } = / h(T-t)  E{z(t ) z(t)}  dr, 
'0 

and  therefore 


(7.2.5) 


a2  - a2  j h(T-t)  dt  + Nq  h(T-t)  -V*te  [0,T]. 


(7.2.6) 


The  only  filter  function  h(T-t)  that  solves  this  equation  is  a 
constant,  i.e., 

2 


h(T-t)  = ---5-  . 
N +o*T 


(7.2.7) 


135 


f PUP iff’ 


i 


After  combining  equation  (7.2.7)  with  equation  (7.2.4)  the  optimum 
estimate  for  "a"  is  the  weighted  integral  of  the  measurements,  i.e.. 


'0  cr  T+N 


z(t)  di  . 


(7.2.8) 


The  variance  of  the  error  in  estimating  "a"  is  found  as  follows: 


E {(a-a)  } = E (a  - 


o T+N 


lzU] 


a2  - 2 (o2T)  + 


o T+N 


Jklr"'*] 


_2  2(o2)2T  . (o2)2T 

o K + — * 


o T+N  a T+N 
o o 


o T+N 


(7.2.9) 


Using  the  properties  of  Gaussian  random  variables  the  entropy  of  the 
estimation  error  must  be 


H(a-a)  = i LOG  ( -»  ■ ^ 2ne  ) . 

i oT+N 

o > 

This,  taken  together  with  the  entropy  of  "a" 


(7.2.10) 


H(a)  = | LOG  (o2  2i»e), 


(7.2.11) 


proves  that  the  change  in  the  entropy  of  ’’a,”  because  an  estimation 
is  made  is 


• . jl  .. 


136 


I 


1 

H(a)  - H(a-a)  5 j LOG  I - 


(7.2.12) 


The  sane  result  nay  be  obtained  frcn  a direct  application  of  the 
Estimation  Theorem,  3.5  (applied  to  continuous  time  systems  and 
using  the  work  of  Hyang  [33,p.S9]). 


According  to  Hyang, 


IT(s(t)+n(t);s(t))  = IT(a+n(t);a)  = j Y.  LOG  (1+Xk),  (7.2.13) 


K 


where  the  subscript  T denotes  that  the  time  interval  over  which  the 
processes  are  defined  is  [0,T].  The  XK  found  from  the  projection  of 
the  signal  onto  an  eigenfunction  space  spanned  by  the  set  of  functions 
Fk(t),  i.e.. 


E {<F,.,  s>  } = Xv  , 


(7.2.14) 


where  the  inner  product  is  defined  as 
T 


<F, 


r,  S>  S( 


t)  F^(t)  dx  . 


(7.2.15) 


The  eigenfunctions  F^  satisfy  the  integral  equation 
T_ 


£ E {n(tx)  n(t2)}  FK(t2)  dt2  E {a2}  FK(t2)  dt2« 


(7.2.16) 


.or  the  simple  D.C.  process  in  white  noise  the  only  eigenfunction,  F, 


is  a constant,  and 
X, 


o2T 


'l  - TT 


xK  = ° 


K = 2,  3,  ... 


(7.2.17) 


137 


Then  by  equations  (7.2.13)  and  (7.2.17) 


I?(a+n(t)  ;a)  = y LOG 


1 + 


oM 


(7.2.18) 


which,  of  course,  completely  agrees  with  the  informational  quantity 
of  equation  (7.2.12),  which  was  calculated  directly  from  mean  square 
error  analysis. 


7.3  Conclusions 

There  is  no  doubt  that  the  above  information  theoretic  procedure 
may  be  applied  to  more  complex  time-continuous  processes,  but  only 
if  all  the  signals  are  Gaussian  and  their  covariance  functions  have 
Karhunen- Loe ve  expansions.  Fortunately  one  can  expect  that  the 
Gaussian  restriction  will  be  removed  with  further  research,  so  the 
only  real  problem  to  consider  is  whether  equation  (7.1.2),  (repeated 
here) 


H(Dy(t))  - H(x(t))  = I(y(t) ;z(t))  (7.2.19) 

has  any  meaning,  and  not  whether  the  information,  I(y(t) ;z(t) ) , can 
be  calculated.  In  this  context,  I(y(t);z(t))  still  retains  its  inter- 
pretation as  a Channel  Transmittance  so  that  even  in  the  continuous 
case  it  is  the  Sensor  Channel  Transmittance  that  governs  the  ability 
of  the  system  to  improve  the  entropy  uncertainty  of  the  signal. 

Equation  (7.2.19)  is  effective  for  comparing  the  performances  of 
different  sensors  used  in  optimum  configurations.  Unlike  the  equations 
for  discrete  time,  the  continuous  entropy  bounding  equations  can  not 
be  rewritten  as  variance  bounding  equations.  This  is  because  there 


138 


CHAPTER  EIGHT 


ENTROPY  ANALYSIS  OF  ADAPTIVE  CONTROL 
8.1  Introduction 

Underlying  the  whole  theory  of  adaptive  control  is  the  idea 
that  the  output  of  the  vaguely  described  system  can  be  examined  in 
order  to  obtain  a better  description  of  the  system  and  ultimately  to 
use  this  information  to  improve  the  control  of  the  system.  Implied 
in  this  procedure  is  the  concept  that  the  system  output  contains 
information  about  the  system  parameters  which  might  be  profitably 
used  to  estimate  those  parameters . If  the  experience  gained  in 
this  dissertation  for  signal  estimation  is  any  indication,  it 
should  follow  that  entropy  is  a logical  tool  for  describing  para- 
meter estimation.  In  the  example  presented  below,  a very  simple 
situation  is  examined.  The  system  output  is  a linear  function  of 
the  system  parameters  which  are  to  be  estimated.  This  is  not  the 
typical  type  of  problem  encountered  in  adaptive  control,  however 
this  model  does  represent  a system  with  unknown  initial  conditions 
and  could  still  be  valuable.  A solution  is  obtained  by  imbedding 
this  problem  into  the  estimation  problem  and  applying  the  estimation 
Theorem  3.4.  Even  though  severe  restrictions  are  encountered  in 
the  useful  application  of  entropy  to  the  adaptive  control  problem, 
it  is  not  overly  optimistic  to  expect  that  they  will  soon  be  relaxed. 
In  any  case,  the  important  conclusion  is  that  the  amount  of  improve- 
ment in  the  entropy  uncertainty  of  a system  parameter  is  limited  by 


140 


the  properties  of  the  sensor  measuring  the  output  and  the  properties 
of  the  output  function  that  is  measured. 

8.2  A Parameter  Estimation  Problem 

Consider  the  following  problem:  The  scalar  output  of  a discrete 
time  system  at  time  t^  (i'th  instant)  is 
T 

wi  = !li  “ + ni  1 = 1 , . . . , K 
where  is  a column  vector,  n^  is  additive  noise  independent  of  o, 
and  £ is  a column  vector  of  size  K of  parameters  upon  which  the  output 
depends  linearly.  It  is  desired  to  estimate  o.  This  problem  imme- 
diately falls  into  the  context  of  the  estimation  problem,  Section  3.4, 
especially  when  the  following  corresponds  at  the  Kth  instant  are 
noted: 

Y = o 

6 = {M1,M2,...Mk}T 

C = I 

D = I 

Z = B a + N 

U = a 

V = a 

X = a ~ 

It  then  follows  that  the  improvement  in  the  entropy  uncertainty  of 
the  system  parameters  is  limited  directly  by  the  Channel  Transmittance 
property  of  the  sensor.  This  is  a new  approach  to  adaptive  control 
analysis . 


141 


According  to  the  results  of  the  estimation  theorem,  the  entropy 
of  the  error  in  the  estimation  of  a is  bounded  by 

H(X)  >_  H(o)  + H(N)  - H(Bo+N) 

where  a and  N are  zero  mean  Gaussian  random  vectors  with  covariance 
matrices 

E {£  ) = R ^ 

E {N  HT)  = R^. 

The  entropy  equation  becomes 

H(X)  > y LOG  { ( 2n )K  DUT  CR^]} 

+ | LOG  {(2n)K  DET  CRnD> 

- y LOG  { (2it)K  DET  [BR^  B^R^] } , 

or,  when  using  optimum  estimates 

K DET  [R  ] DET  [R  ] 

H(X)  = 4 LOG  {(2«)  m 

1 DET  [BR  B +1^] 

This  same  result  may  be  obtained  in  the  following  manner  frcm  a point 
of  view  based  only  on  mean  square  error  analysis.  If  the  estimation 
of  a takes  the  form 

a = A„  Z 

according  to  the  principal  of  orthogonality,  [42,  p.218], 


or  Aq  nvust  satisfy 


Using  the  value  of  A as 


Ac  » [RaB‘]  CBRflBi+Rh] 
it  is  found  directly  that 


It  follows  from  the  constraint  that  DCT[B]  = DET[B  ] = DET[B_1]  *0 


H(a-a) 


DEJTtBR  B + 


This  example  demonstrates  a possible  potential  on  the  part  of  entropy 
to  be  a useful  analysis  tool  for  adaptive  control  problems.  However 
in  this  very  simple  example  there  is  a significant  limitation  that 
requires  further  research  to  overcome.  The  significant  limitation 
is  the  form  of  the  constrant  on  the  operation  B,  i.e., 


dB(Y) 

-$r~ 


This  restriction  on  the  Jacobian  of  the  transformation  B(Y)  implies 
two  conditions: 

1.  B is  K valued  function  of  K variables. 

2.  B preserves  information. 

Condition  1 insists  that  exactly  K measruements  be  made  if  K para- 
meters are  to  be  estimated.  This  is  not  a cannon  condition  since  most 
often  redundant  measurements  are  taken.  Constraining  B(Y)  in  this 
manner  was  necessary  for  the  proof  of  Theorem  3.4.  However,  this  is 
not  a requirement  for  the  proof  of  Theorem  8.3  below,  which  allows 
both  redundant  measurements  and  nonlinear  plant  outputs. 

8.3  Entropy  Solution  of  the  Identification  Problem 

Theorem  8.3 

The  entropy  of  the  error,  in  identifying  an  m dimensional 
parameter  set,  j^,  satisfies 

H<V  * i(4,'5k> 

vrfiere  the  identification  system  has  the  form  shown  in  Figure  8.1. 

In  general,  the  output  signal  of  the  plant  to  be  identified  is  a 
nonlinear  function  of  the  parameters  6fn.  This  nonlinear  operation 
has  been  included  in  the  sensor  model,  so  that  is  recognized 

to  be  the  Channel  Transmittance  of  the  sensor.  Obviously  the  improve- 
ment of  the  parameter  entropy  from  to  HCX^)  can  never  exceed 

the  Sensor  Channel  Transmittance. 


Proof : 


4 = 4-  F<4) 

l4-4>  - 4,z  (VF(4>-4> 

or 

H<W  ■ H<W 
i<4«k>  * H<4>  * H(4>  - H<4-V 
* H(4>  - H<4>  * I(W 
H<4)  * H(4>  * I(4-4>  - «4>4> 

which  proves  the  theorem. 

8.4  Conclusions 

The  really  significant  result  would  be  to  "close"  the  identi- 
fication system  through  the  use  of  a control  scheme  and  show  that 
the  adaptive  system  performance  is  bounded  by  the  Sensor  Channel 
Transmittance.  The  difficulty  lies  in  the  fact  that  the  only  important 
adaptive  systems  are  those  having  outputs  that  are  nonlinear  functions 
of  the  unknown  parameters.  Unfortunately  the  entropy  solution  of  a 
feedback  system  with  that  much  generality  is  yet  to  be  determined. 


146 


CHAPTER  NINE 


MMRRQRWHNRSlIttHgKMflMMPi 


SUMMARY  OF  RESULTS  AND  AREAS  FOR  FUTURE  SEARCH 
9.1  Sunnary  of  Results 

The  application  of  the  entropy  concepts  to  the  analysis  of 
sampled  data  feed- forward,  and  sampled  data  feedback  systems  has  led 
to  seme  interesting  and  useful  results.  Through  the  use  of  the  entropy 
as  a performance  measure,  an  information  quantity  — the  Sensor 
Channel  Transmittance  — may  be  defined,  which  forms  the  basis  of 
a far  reaching  analysis  technique.  For  example,  in  the  estimation 
problem  with  either  real  time  or  non-real  time  and  with  no  constraints 
on  any  of  the  system  elements  or  signals  (other  than  the  assuned 
existence  of  the  signal  entropy),  it  is  the  Sensor  Channel  Trans- 
mittance, or  a close  relative  of  it,  which  bounds  the  estimating 
system  performance.  Moreover,  the  form  of  the  sensor  need  not  be 
specified  and  the  channel  property  may  even  be  calculated  experi- 
mentally if  necessary.  Under  certain  conditions,  the  estimation 
filter  may  be  chosen  in  such  a way  that  an  optimun  system  is  achieved. 
The  performance  improvement  of  the  optimun  system  is  then  equal  to 
the  Sensor  Channel  Transmittance.  This  result  is  important  for 
two  reasons;  first  it  justifies  the  use  of  entropy  analysis  for 
investigating  systems,  and  second,  entropy  optimization  performance 
limits  are  known  independent  of  whether  or  not  the  necessary  optimizing 
filter  is  determined  in  advance. 

As  a by  product  of  the  study  leading  to  the  major  results  of  this 
dissertation,  several  philosophically  pleasing  observations  were  nude. 


147 


The  most  interesting  observation  was  derived  frcm  a study  of  the 
equation 

H(X)  _>  H(D(Y)/Z) . 

This  expression  is  a direct  analog  to 

VAR  {X:  _>  VAR  (D<Y)/Z) 

and  contributes  to  the  belief  that  entropy  and  variance  are  each 
special  cases  of  a more  general  criterion  function,  uncertainty. 

The  advantages  of  using  an  uncertainty  function  other  than  variance 
has  already  been  demonstrated  by  this  dissertation.  For  example,  a 
monotonic  decrease  of  error  variance  with  time  can  not  easily  be  shown, 
but  with  entropy  the  decrease  is  easily  proven.  At  a more  practical 
level,  the  form  of  the  results  are  always  such  that  they  reduce  to 
well  known  equations  for  Gaussian  variables  and  to  acceptable 
variance  inequalities  for  non-Gauss ian  randan  variables.  Therefore, 
if  for  no  other  reason,  entropy  analysis  is  justified  because  it 
always  leads,  very  quickly,  to  mean  square  error  bounds. 

It  is  not  possible  to  make  such  broad  statements  about  feedback 
control  systems.  The  most  critical  assumption  made  for  the  solution 
of  this  problem  is  that  the  sensor  must  have  a linear  prefilter  and 
additive  noise.  Nonetheless  the  same  conclusions  regarding  the 
importance  of  Sensor  Channel  Transmittance  as  a performance  bound, 
the  significance  of  entropy  as  one  type  of  uncertainty,  and  the 
reduction  in  the  special  case  to  accepted  Gaussian  results,  may  still 
be  made.  The  real  power  of  the  feedback  entropy  bounds  is  that  not 
only  are  they  independent  of  the  feedback  filter  but  they  depend  only 


on  the  open  loop  behavior  of  the  sensor.  If  a filter  to  achieve 
optimun  performance  exists,  then  the  optimun  performance  achieves 
the  entropy  bound  and  that  bound  may  be  determined  from  the  Sensor 
Channel  Transmittance  without  first  having  to  calculate  the  optimun 
feedback  function  to  close  the  loop. 

Less  conclusive  but  just  as  satisfying  are  the  results  derived 
for  adaptive  control  systems  and  continuous  time  systems.  In  both 
cases,  results  are  obtained  that  justify  the  potential  of  entropy 
analysis.  Theoretically,  the  continuous  time  problem  has  been 
solved  and  the  solutions  are  identical  to  those  obtained  for  sampled 
data  systems,  i.e.,  the  improvement  in  system  performance  because  of 
a feed-forward  (or  feedback)  path  is  limited  by  the  continuous 
Sensor  Channel  Transmittance  of  that  path.  The  difficulty  is  in 
defining  continuous  time  entropy  and  not  in  making  use  of  it. 

Similarly,  when  the  unknown  parameters  of  an  object  are  to  be  deter- 
mined for  use  by  a controller  in  an  adaptive  system,  the  ability 
to  determine  those  parameters  is  limited  by  the  Channel  Transmittance 
of  the  sensor  being  used. 

9.2  Areas  for  Future  Research 

There  are  three  critical  problems  restricting  the  application  of 
entropy  analysis: 

1.  The  assumption  of  a feedback  sensor  having  a linear  prefilter. 

2.  The  assumption  of  the  existence  of  a continuous  time  entropy 
function. 


149 


3.  The  fact  that  entropy  measurement  is  not  a sophisticated 


art. 

Relaxation  of  the  constraints  on  the  feedback  sensor  are  desirable 
because  several  important  feedback  control  sensors  can  not  be 
modelled  exactly  in  the  form  that  was  used  for  the  feedback  theorem. 
The  human  operator  is  an  example  of  a sensor  that  has  no  such  model. 
It  would  be  a great  advantage  if  a theorem  could  be  derived  for 
bounding  the  performance  of  feedback  systems  vtfuch  depends  only  on 
the  channel  property  of  the  sensor.  If  such  a generalization  is  not 
possible,  it  would  be  convenient  to  at  least  allow  for  nonlinear 
sensor  prefiltering.  This  result  in  a vector  context,  together  with 
Chapter  Eight , would  iimediately  solve  the  problem  of  determining 
the  performance  of  a "closed  loop"  adaptive  controller.  It  is 
expected  that  it  is  possible  to  make  a statement  of  the  form: 

"The  reduction  in  the  output  entropy  of  a plant  which  is 
controlled  by  an  adaptive  system  is  bounded  by  the  Channel 
Transmittance  of  the  sensor  used  to  adaptively  control  the 
plant . " 

If  the  conclusions  of  this  dissertation  are  accepted  as  being 
important , then  there  is  no  question  that  a continuous  time  entropy 
measure  is  required.  That  such  a measure  would  be  put  to  immediate 
use  was  demonstrated  in  Chapter  Seven.  But,  until  a suitable 
entropy  function  is  defined,  these  results  can  not  be  interpreted  or 
used  effectively. 


There  remains  only  one  other  fundamental  problem  with  entropy 
analysis  and  that  is  the  newness  of  the  concept.  Systems,  at  present 
are  not  usually  described  in  terms  of  information  quantities  so  a 
whole  new  set  of  system  properties  must  be  defined  and  measured. 

This  will  involve  designing  instnments  and  algorithms  to  aid  in 
the  determination  of  mutual  information  and  entropy.  This  is  an 
area  of  interest  that  is  virtually  uninvestigated.  Certainly  if 
entropy  analysis  is  to  achieve  prominence,  entropy  measurements 
must  not  be  neglected. 


1S1 


REFERENCES 


Shannon,  C.  E.  and  W.  Weaver,  The  mathematical  theory  of 
carinunicat ion . The  University  of  Illinois  Press,  Urbana, 
Illinois,  1963. 

Feinstein,  A.,  Foundations  of  information  theory.  McGraw- 
Hill,  New  York,  1958. 

Khinchin,  A.  I. , Mathematical  foundations  of  information 
theory  (translated  from  the  Russian  by  R.  A.  Silverman  and 
M.D.  Friedman).  Dover  Publications,  Inc.,  New  York,  1957. 

Wyner,  A.  D. , "The  capacity  of  the  band-limited  channel." 
The  Bell  system  technical  journal,  45:359-395,  March,  1966. 

Ash,  R.  B. , "Capacity  and  error  bounds  for  a time-continuous 
Gaussian  channel."  Information  and  control,  6:14-27,  1963. 

Lcmnicki,  Z.  A.  and  S.  K.  Zaremba,  "The  asymptotic  distri- 
butions of  estimators  of  the  amount  of  transmitted  infor- 
mation." Information  and  control,  2:260-284,  1959. 

Abramson,  N. , Information  theory  and  coding.  McGraw-Hill. 
New  York,  19631 

Fano,  R.  M. , Transmission  of  information.  The  M.I.T.  Press, 
Cambridge,  Mass.,  1963. 

Ash,  R.  B. , Information  theory.  Interscience  Publishers, 

New  York,  1965. 

McMillan,  B.  "The  basic  theorems  of  information  theory." 
Annals  of  mathematical  statistics,  24:196-219,  1953. 

Feinstein,  A.,  "A  new  basic  theorem  of  information  theory." 
I.R.E.  transactions  on  information  theory,  4:2-22,  Sept., 

McGill,  W.  J.,  "Multivariate  information  transmission." 
I.R.E.  transactions  on  information  theory.  4:93-111,  Sept., 
1954. 

Kcmologorov , A.  N.,  "On  the  Shannon  theory  of  information 
transmission  in  the  case  of  continuous  signals."  I.R.E. 
transactions  on  information  theory,  2:102-108,  Dec. ,1956. 


14.  Chover,  J. , "On  normalized  entropy  and  the  extensions  of 

a positive-definite  functions."  Journal  of  mathematics  and 
mechanics,  10(6) :927-945,  1961. 

15.  Birch,  J.  J.,  "Approximations  for  the  entropy  for  functions 
of  Markov  chains."  Annals  of  mathematical  statistics,  33:930- 
938,  Sept.,  1962. 

16.  Gerrish,  A.  M. , and  P.  M.  Schultheiss,  "Information  nates  of 
non-Gaus s ian  processes . " I.R.E.  transactions  on  information 
theory,  10(4):265-271,  Oct.,  1964. 

17.  Golanb,  S.  W. , "A  new  derivation  of  the  entropy  expressions." 
I.R.E.  transactions  on  information  theory,  7(3) :166-167, 

July,  1961. 

18.  Karush,  J.,  "A  simple  proof  of  an  inequality  of  McMillan." 
I.R.E.  transactions  on  information  theory,  7(2): 118,  April, 

twt. 

19.  Lindley,  D.  V. , "On  a measure  of  the  information  provided  by 
an  experiment . " Annals  of  mathematical  statistics,  27:986- 
1005,  1956. 

20.  DeGroot,  M.  H. , "Uncertainty,  information,  and  sequential 
experiments.”  The  annals  of  mathematical  statistics,  30(2): 
404-419,  June,  1$62 . 

21.  Kelly,  J.  L. , "A  new  interpretation  of  information  rate."  Tar 
Bell  system  telephone  journal,  35:914-926,  July,  1956. 

22.  Elkind,  J.  I.,  "Transmission  of  information  in  simple 
manual  control  systems."  I.R.E.  transactions  on  human 
factors  in  electronics,  2(l):58-60,  March,  1961. 

23.  Foy,  Wade  H. , Random  parameters  in  linear  systems,  Ph.D. 
in  Engineering,  the  Johns  Hopkins  University,  Baltimore, 
Maryland,  Dec.,  1961. 

24.  Balakrishnan,  A.  V.,  "Signal  selection  theory  for  space 
communications  channels."  Advances  in  communications 
systems  (edited  by  A.  V.  Balakrishnan)  : 1-31.  Academic 
Press , foew  York,  1965. 

25.  Ovseevich,  I.  A.,  and  M.  S.  Pinsker,  "Evaluation  of  the 
carrying  capacity  of  a communication  channel  whose  para- 
meters are  random  functions  of  time."  Radio  engineering, 
12(10:54-62,  1957. 


153 


26.  Bishop,  W.  B.  and  B.  L.  Buchanan,  "Message  redundancy  vs. 
feedback  for  reducing  message  uncertainty."  The  I.R.E. 
convention  record,  part  2:33-39,  March,  1957. 

27.  Varshaver,  B.  A.,  "The  theory  of  signal  transmission  with 

multiple  discrete  values."  Radio  engineering,  14(1): 1-1 3, 
Nov. , 1959.  ' 

28.  Ovseevich,  I.  A.  and  M.  S.  Pinsker,  "The  speed  of  trans- 
mission of  information  and  the  carrying  capacity  of  a 
multipath  system  and  reception  by  ^pear  operator  - - < 
conversion  method.1**' Rach-o  engineering,  14(3):l3-29, 

Jan. , 1960.  ’ 

29.  Good,  I.  J.  and  D.  C.  Doog.,  "A  paradox  concerning  rate  of 
information."  Information  and  control,  1:113-126,  May,  1958. 

30.  Swerling,  P. , "Paradoxes  related  to  the  rate  of  transmission 

of  information."  Information  and  control,  3:351-359,  Dec.. 
1960.  

31  Gel'Fand,  I.  M.  and  A.  M.  Yaglom,  "Calculation  of  the  amount 
of  information  about  a random  function  contained  in 
another  such  function."  American  math  society  translations, 
series  2,  12:199-246,  195?:  

32.  Pinsker,  M.  S.,  Information  and  information  stability  of 
random  variables  and  processes.  (Translated  from  the  Russian 
by  A.  Feinstein),  Holden-Day,  San  Francisco,  1964. 

33.  Hyang,  Robert  Y.  An  information  theory  for  time  continuous 
processes.  Ph.D.  in  Engineering,  Syracuse  University, 
Syracuse,  New  York,  June,  1962. 

34.  Balakrishnan , A.  V. , "Estimation  and  detection  theory  for 
multiple  stochastic  rpocesses."  Journal  of  mathematical 
analysis  and  applications,  l:386-4l0,  Dec.,  1960. 

35.  Balakrishnan,  A.  V.,  "On  a class  of  nonlinear  estimation 
problems."  I.R.E.  transactions  on  information  theory , 

10(4) :314-320,  Oct.,  1964. 

36.  Weiner,  N. , Cybernetics,  2nd  edition,  Wiley,  New  York,  1961. 

37.  Garner,  W.  R.  and  W.  J.  McGill,  "The  relation  between  infor- 
mation and  variance  analysis."  Psychcmetrika , 21(3) : 219-228, 
Sept . , 1956. 

38.  Swarup,  Chaitanya,  Some  informational  theoretical  and 
empirical  techniques  in  statistical  inference.  Ph.D.  in 
Mathematics,  The  Univeristy  of  New  Mexico,  1964 . 


154 


39.  Krasovskiy,  A.  A.,  "Entropy  stability  of  linear  continuous 
automatic  control  systems."  Engineering  cybernetics,  1(5): 
10-16,  1963. 


40.  Krasovskiy,  A.  A.,  "Variation  in  entropy  of  continuous 
dynamic  systems."  Engineering  cybernetics,  2(5): 1-11, 
Sept.,  1964. 

41.  Parzen,  E. , Modern  probability  theory  and  its  applications, 
Wiley,  New  York,  1%0. 

42.  Papoulis,  A.,  Probability,  random  variables,  and  stochastic 
processes , McGraw-riill,  New  York,  i§6 5. 


155 


*U.».aov«rnmwit  Priming  Or  tie*  > J*7*  - •57  00J/M* 


